SSIS MySQL ODBC - batch Insert size

SSIS MySQL ODBC - batch Insert size

Author
Discussion

anxious_ant

Original Poster:

2,626 posts

85 months

Thursday 10th November 2022
quotequote all
I’m a bit stumped here so hopefully someone can help.
I need to push data from local SQL server to a remote MariaDB database.
I’ve managed to connect using 32-bit MySQL ODBC connecter to be used as ODBC destination in SSIS.

However inserting 30k records takes over an hour.
From monitoring inserts in MySQL Workbench I noticed that it’s inserting around 10 records every second.

I’ve set the Data Flow in SSIS to use batch mode but I don’t think this is being applied? Is there anything I can do to improve insert performance I.e different connectors or settings? There is a native MariaDB ODBC connector but that’s only 64 bit so DSN is not visible in Visual Studio.

In desperation I’ve also tried using linked servers and OPENQUERY for the insert but not much improvement in performance. Pulling from remote server is not an option here.

Hopefully someone can kindly provide some advice , as the alternative would be to send the data in CSV format using FTP…

Edit : typo in subject, meant to say SSIS

Edited by anxious_ant on Thursday 10th November 17:45

i4got

5,732 posts

84 months

Thursday 10th November 2022
quotequote all
I've no experience on MariaDB so can't offer any specific advice but are there any indexes on the destination table? If so it may be worth doing a test having dropped the indexes first (assuming you have the access level to do this).

If that makes a big difference could your script either drop and rebuild the indexes as part of your transfer? Or insert to an empty working table in MariaDB then an insert to the real table from the working table afterwards?





buggalugs

9,243 posts

243 months

Thursday 10th November 2022
quotequote all
Find what’s taking the time I guess. If the MariaDB is 100ms away in network ping then there’s 10/s if not batching. Or maybe the connector is tearing down the connection for each request or something daft. There’s loads of monitoring you can do at the DB end.

anxious_ant

Original Poster:

2,626 posts

85 months

Thursday 10th November 2022
quotequote all
i4got said:
I've no experience on MariaDB so can't offer any specific advice but are there any indexes on the destination table? If so it may be worth doing a test having dropped the indexes first (assuming you have the access level to do this).

If that makes a big difference could your script either drop and rebuild the indexes as part of your transfer? Or insert to an empty working table in MariaDB then an insert to the real table from the working table afterwards?
The destination table has one index which is the unique key.
For testing I’ve created a copy of this table and deleted all the data from the new table. I’m inserting into this empty table.

I could try to remove the index on this new table and see if it helps. However not sure if I can use SSIS to insert from two remote tables. Definitely something to try though, thanks!

Didn’t realise indexes could affect insert performance.
This is the first time I am pushing data to a remote server, which is not native SQL using ODBC connector, so quite a learning process.

Timed a test insert of 39.5k records and it took 1.5 hours.
This is an incremental update and normally we’d expect 100ish rows but can spike to 30k+ if there is a big update.

Current transport method is via FTP but prone to break due to various security and file lock issues. Really would like to push data directly if this is feasible.

anxious_ant

Original Poster:

2,626 posts

85 months

Thursday 10th November 2022
quotequote all
buggalugs said:
Find what’s taking the time I guess. If the MariaDB is 100ms away in network ping then there’s 10/s if not batching. Or maybe the connector is tearing down the connection for each request or something daft. There’s loads of monitoring you can do at the DB end.
I currently use MySQL workbench and run a “select count(*)” every second on the remote tables as means of monitoring. Would be happy to be enlightened if there are better ways for monitoring smile

buggalugs

9,243 posts

243 months

Friday 11th November 2022
quotequote all
anxious_ant said:
buggalugs said:
Find what’s taking the time I guess. If the MariaDB is 100ms away in network ping then there’s 10/s if not batching. Or maybe the connector is tearing down the connection for each request or something daft. There’s loads of monitoring you can do at the DB end.
I currently use MySQL workbench and run a “select count(*)” every second on the remote tables as means of monitoring. Would be happy to be enlightened if there are better ways for monitoring smile
If you've got phpmyadmin available log in as mysql root and go up to status at the top

Processes will show current connections and what they're doing, you'll see the app connection there and if it's doing something then what is it and how long it's taken. If you see a single connection that's doing sweet FA a lot of the time then it's just doing the inserts in a really inefficient way i.e. one at a time serially single threaded. If the inserts are taking a long time that's weird so investigate. If the DB is hammered by something else you'll see that also.

Query statistics might be interesting might not

Monitor again might be interesting

Advisor usually has some good suggestions for optimising the DB or query setup if that's the issue but don't go down too many rabbit holes if the raw DB speed isn't really the issue

(Sure you can get to this info command line stylee but I just use phpmyadmin because I have it and I don't have to remember st loads of commands)

If you have access to the my.conf config file you can set up a slow query log also (if that's the issue)

If it is a one-at-a-time latency type problem you can batch the inserts up in your code without relying on the connector to do it, i.e. insert a st load of rows in a single insert statement. Might be the easiest way to resolve.

I'm not a proper programmer or DBA I just end up being the one who has to sort stuff out when the real ones go 'must be the server' hehe

Edited by buggalugs on Friday 11th November 10:16

anxious_ant

Original Poster:

2,626 posts

85 months

Friday 11th November 2022
quotequote all
buggalugs said:
If you've got phpmyadmin available log in as mysql root and go up to status at the top

Processes will show current connections and what they're doing, you'll see the app connection there and if it's doing something then what is it and how long it's taken. If you see a single connection that's doing sweet FA a lot of the time then it's just doing the inserts in a really inefficient way i.e. one at a time serially single threaded. If the inserts are taking a long time that's weird so investigate. If the DB is hammered by something else you'll see that also.

Query statistics might be interesting might not

Monitor again might be interesting

Advisor usually has some good suggestions for optimising the DB or query setup if that's the issue but don't go down too many rabbit holes if the raw DB speed isn't really the issue

(Sure you can get to this info command line stylee but I just use phpmyadmin because I have it and I don't have to remember st loads of commands)

If you have access to the my.conf config file you can set up a slow query log also (if that's the issue)

If it is a one-at-a-time latency type problem you can batch the inserts up in your code without relying on the connector to do it, i.e. insert a st load of rows in a single insert statement. Might be the easiest way to resolve.

I'm not a proper programmer or DBA I just end up being the one who has to sort stuff out when the real ones go 'must be the server' hehe

Edited by buggalugs on Friday 11th November 10:16
Thanks! This is really helpful. I am not an admin on the server but I will pass this along.

From research this is due to the connector and the fact that I am pushing rather than pulling data.

Querying rows via select (remote MariaDB to SQL via 32bit MySQ” 32 bit ODBC connector) is relatively fast.

I am not a proper programmer and DBA too, but wear many hats smile

First time doing this so hope to get so sage advice from someone who’s done similar as can recommend best practices.

In the end of the day, if the process takes a few hours but doesn’t fail it should be acceptable as the connection doesn’t drop.
Daily rows average in the hundreds due to incremental updates, but we do get spikes of 30k+ rows every now and then.


Edited by anxious_ant on Friday 11th November 10:30

buggalugs

9,243 posts

243 months

Friday 11th November 2022
quotequote all
Maybe look at logging what you can from your end then? Log timestamps in ms before and after connection setup, statement prep, statement execute etc.

Definitely look at multiple rows per INSERT

Good luck with it smile


maffski

1,880 posts

165 months

Friday 11th November 2022
quotequote all
Can you try running more than one connection and inserting records in parallel?

anxious_ant

Original Poster:

2,626 posts

85 months

Friday 11th November 2022
quotequote all
Spent a day researching and testing, and believe the bottleneck is the ODBC connector. I’ve found a setting to disable caching which made small difference but it’s still inserting a few records per second.

Tried using the .Net MySQL connector over ADO.NET connection but for some reason the password will not save in the connection. Tried various stuff like encrypting the project and parameterising the connection string but all doesn’t seem to work. Perhaps the Visual Studio version (2010) is too old.

Did a worse case scenario study and biggest upload would take around 4 hours (100k+rows). Just did a test run with 3.5k rows and it took 8 minutes. This does not happen often, but still not too happy with the runtime.

Ah well … it’s the weekend so time for some beer …

Edited by anxious_ant on Friday 11th November 17:08


Edited by anxious_ant on Friday 11th November 17:09

anxious_ant

Original Poster:

2,626 posts

85 months

Friday 11th November 2022
quotequote all
maffski said:
Can you try running more than one connection and inserting records in parallel?
Not too sure how to do this is SSIS…

Dracoro

8,782 posts

251 months

Friday 11th November 2022
quotequote all
anxious_ant said:
The destination table has one index which is the unique key.
For testing I’ve created a copy of this table and deleted all the data from the new table. I’m inserting into this empty table.
That unique index field - are you setting this when inserting OR is that field an auto-generated indentity (or MariaDB equivalent) type field.

anxious_ant said:
Didn’t realise indexes could affect insert performance.
It does, every insert will require the DB engine updating the index(es), if a given index is clustered it’s even more impactful as has to sort it (i.e. insert into the index in the right place). Now if that index is not a numeric type, that sort operation is even more heavy, e.g if it is a GUID etc.

The above applies for SQL Server but I suspect most DB do it similarly.

Anyway, I would suggest doing a data transfer into a new (non-indexed, other than maybe an identity field) “temporary” table. That should help ascertain better where issue is (e.g. network, batching, db-auto growth, etc.).
It could be quite a few things, that’s why we have DBAs biggrin

Edited by Dracoro on Friday 11th November 20:37

anxious_ant

Original Poster:

2,626 posts

85 months

Sunday 13th November 2022
quotequote all
Dracoro said:
anxious_ant said:
The destination table has one index which is the unique key.
For testing I’ve created a copy of this table and deleted all the data from the new table. I’m inserting into this empty table.
That unique index field - are you setting this when inserting OR is that field an auto-generated indentity (or MariaDB equivalent) type field.

anxious_ant said:
Didn’t realise indexes could affect insert performance.
It does, every insert will require the DB engine updating the index(es), if a given index is clustered it’s even more impactful as has to sort it (i.e. insert into the index in the right place). Now if that index is not a numeric type, that sort operation is even more heavy, e.g if it is a GUID etc.

The above applies for SQL Server but I suspect most DB do it similarly.

Anyway, I would suggest doing a data transfer into a new (non-indexed, other than maybe an identity field) “temporary” table. That should help ascertain better where issue is (e.g. network, batching, db-auto growth, etc.).
It could be quite a few things, that’s why we have DBAs biggrin

Edited by Dracoro on Friday 11th November 20:37
Cheers smile

I believe the index could be auto-generated, and it was on an integer type column (row ID basically). I was told it is MariaDB but checking the properties in MySQL Workbench it's saying InnoDB engine. Perhaps they are the same, as I am not too familiar with databases outside of SQL smile

I have also tried dropping the index on the table but don't notice much performance improvements.

Starting to suspect it's the ODBC connection that I am using in SSIS. Shame I cannot get the Oracle MySQL .NET driver working (not saving password), we would love to test "Fast Load" feature to see if that's better. The licensed Visual Studio is 2010 and I believe there is a compatibility issue with latest MySQL .NET drivers. Tried legacy 6.x versions but that doesn't seem to help.

rustyuk

4,655 posts

217 months

Sunday 13th November 2022
quotequote all
If it's ODBC killing performance then just spin up 10 threads and do 3k per thread.

You should update to the latest version of .Net too. VS Community is free unless you work for a large organisation. VS Code is free for everyone.

anxious_ant

Original Poster:

2,626 posts

85 months

Tuesday 15th November 2022
quotequote all
rustyuk said:
If it's ODBC killing performance then just spin up 10 threads and do 3k per thread.

You should update to the latest version of .Net too. VS Community is free unless you work for a large organisation. VS Code is free for everyone.
Thanks for the advice. I am using SSIS though to push data.
I believe SSIS only comes with full version of Visual Studio?

maffski

1,880 posts

165 months

Tuesday 15th November 2022
quotequote all
anxious_ant said:
maffski said:
Can you try running more than one connection and inserting records in parallel?
Not too sure how to do this is SSIS…
I don't use SSIS - but does this help? https://youtu.be/yf5pJt_2pe8

Basically create several tasks that each cover a range of your primary key and then run them in parallel. If the speed limitation is due to the connection and multiple tasks each create their own connection this should speed it up.


anxious_ant

Original Poster:

2,626 posts

85 months

Wednesday 16th November 2022
quotequote all
maffski said:
I don't use SSIS - but does this help? https://youtu.be/yf5pJt_2pe8

Basically create several tasks that each cover a range of your primary key and then run them in parallel. If the speed limitation is due to the connection and multiple tasks each create their own connection this should speed it up.

Cheers for this, will have a look.

Found out the MySQL ODBC connector has a logging function.
From the sql logs I can confirm that individual insert statements are generated for each row.

Also managed to get batching working. Instead of sending each statement, the process waits and sends in X batches.
Still takes same amount of time in batch mode compared to sending requests ad hoc.

Tried various other connection methods (ADO.NET, MSDataShape, OLE db) but they all don’t work. Oh well …