While master data management solutions may take many forms, most of them share similar architecture. This architecture is what allows for the accurate, consistent management of data and data processes by maintaining a structured environment under which MDM tools can operate. At the core of these systems is the MDM hub, a database in which master data is cleaned, collected and stored. MDM solutions may use multiple hubs to govern different sets of data, such as product information, customer data and site data, and each hub generally utilizes one of three common models: transaction/repository, registry, or hybrid.
In a transaction/repository-style hub, all relevant data is stored and accessed from a single database, and the database must contain all of the information needed by the different applications which access it. All data is consolidated and centralized, and published to the individual data sources after it has been linked and matched. This style of hub allows for a single source of data to be created, minimizing duplication by making it easier to detect as data is collected and cleaned. However, the transaction/repository style has drawbacks as well. Existing applications may have to be modified to use the master data, and in some cases this is not possible. Different applications and services which serve as an interim interface between the MDM software and the data-dependent applications may be needed and this can add to costs. Also, data models need to be complex enough to include all relevant information for the applications that utilize them, but not so large that they become overly large.
Registry style hubs, in contrast, do not store master data in the hub, but rather master data is maintained within native application databases. The hub instead stores lists of keys with which to access all relevant attributes for a specific master data entity, linking these attributes between application databases. The registry style hub allows for applications to remain fairly intact as all data is managed within native databases. However, when requests are made to access master data, data must be located, a query must be distributed between numerous databases, then a list of the requested data must be formed all in real time, and as the number of source databases grows, this can become increasingly inefficient. In addition, duplicate data entities can reside on different databases, or even within the same database, and while consolidation and cleaning of individual databases would be ideal, it is not always practical. Another disadvantage is that when new databases are to be included in the hub registry, new keys must be added to the existing tables, which may also require altering how queries are generated.
Hybrid style hubs utilize methods from both transaction/repository and registry style hubs, and try to address some of the issues present in each. Since it may not be practical to update existing applications or to send inefficient, massive queries across several databases, the hybrid system combines some of the advantages present in the other models by leaving master data on the native databases, generating keys and IDs to access this data, but replicating some of its important attributes to the hub. When queries are made, the hub can service the more common requests, and queries only need to be distributed for the less-used attributes, which results in a more efficient process. While the hybrid style combines advantages of both of its parent models, it has its own disadvantages. Since it stores replicated data from outlying databases, it may run into updating issues, and, like the transaction/repository style, deciding which attributes to store, naming to be used and format to store them in can create problems.


