Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
239 views
in Technique[技术] by (71.8m points)

sql - Merge identical databases into one

We have 15 databases of 75 tables with an avarage of a million rows. all with the same schema but different data. We have now been given the requirements by the client to bring all 15 into one database. Each set of data filtered by the user’s login.

The changes to the application have been completed to do the filtering. We are now left with the task of merging all databases into one.

The issue is conflicting PK and FK as the PK’s and the FK’s are of type int so we will have 15 PK ids of 1.

One idea is to use. net and the DBML to insert the records as new records into the new database letting linq deal with the PK and FK and using code to deal with duplicate data.

What other ways are there to do this?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

It's never a trivial job to integrate databases when the records don't have unique primary keys in all databases. A few weeks ago I built a similar integration script for which I decided to use Entity Framework.

First the good news. With EF's DbContext API it's ridiculously easy to insert a complete object graph and make EF take care of all newly generated primary keys an foreign keys. The reason why this is so easy is that when an object's state is changed to Added all of its adhering objects become Added as well and EF figures out the right order of inserts. This is truly great! It made me build the core of the copy routine in a few hours, which would have been many days if I should have done it in T-SQL for example. The latter is much much more error prone too.

Of course life isn't that easy. Now the bad news:

  1. This takes tons of machine resources. Of course I used a new context instance for each copy step, but still I had to execute the program on a machine with a decent processor and a fair amount of internal memory. The exact specifications don't matter, the message is: test with the largest databases and see what kind of beast you need. If the memory consumption can't be managed by any machine at your disposal, you have to split up the routine in smaller chunks, but that will take more programming.

  2. The object graph that's changed to Added must be divergent. By this I mean that there should only be 1-n associations starting from the root. The reason is, EF will really mark all objects as Added. So if somewhere in the graph a few branches refer back to the same object (because there is a n-1 association), these "new" objects will be multiplied, because EF doesn't know their identity. An example of this could be Company -< Customer -< Order >- OrderType: when there are only 2 order types, inserting one root company with 10 customers with 10 orders each will create 100 order type records in stead of 2.

    So the hard part is to find paths your class structure that are divergent as much as possible. This won't always be possible. If so, you'll have to add the leaves of the converging paths first. In the example: first insert order types. When a new company is inserted you first load the existing order types into the context and then add the company. Now link the new orders to the existing order types. This can only be done if you can match objects by natural keys (in this example: the order type names), but usually this is possible.

  3. You must take care not to insert multiple copies of master data. Suppose the order types in the previous example are the same in all databases (although their primary keys may differ!). The order types from the source database should not be reinserted in the target database. Moreover, you must fix the references in the source data to the correct records in the target database (again by matching by natural key).

So although it wasn't trivial it was doable and the job was done in a relatively short time. I'm sure that other alternatives (t-SQL, integration services, BIDS, if doable at all) would have taken more time or would have been more buggy. And the problem with bugs in this area is that they may become apparent much later.

I later found out that the issues I describe under 2) are related to fetching the source objects with AsNoTracking. See this interesting post: Entity Framework 6 - use my getHashCode(). I used AsNoTracking because it performs better and it reduces memory consumption.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...