Challenges and Strategies of SOLIDWORKS PDM Data Loading
Loading or migrating data into SOLIDWORKS PDM can be a daunting task. However, with diligent planning and thorough testing, the process can proceed smoothly. The following strategies should help ease the migration pain.
With any migration project, the first task—and perhaps the key to the entire project—is analyzing the data to be loaded. This will establish the scope of the migration effort and what the reasonable expectations should be. Depending on the project size or how it will be executed, an initial cursory review can be undertaken to get a feel for what is involved. However, in most cases a more detailed review will need to occur.
When performing such reviews, we break them down into four different areas of analysis:
1. Where is the legacy data coming from? Determine the legacy data, its location and the nature of the data.
- Does the data exist solely on the company’s network and is it primarily folder/file based?
- Is the data coming from a legacy PDM or PLM system? In such cases, there is usually a database behind them, such as a file vault or file storage system. Typically, there is more involved in a migration project than pure files and folder-based data. Sometimes the migration may be comprised of a combination of files, network folders and metadata contained in Excel spreadsheets or output from an ERP system. When dealing with a legacy system, we have seen everything from a proprietary customer database to commercial PLM systems with familiar names like ENOVIA SmarTeam, Autodesk Vault, PTC Windchill and more.
- Is there hard copy data, such as a file cabinet with physical files in it? If so, part of the migration involves scanning the files and loading them into the system. This doesn’t happen often these days, but it does happen.
- Understanding the types of data that will be migrated. Is it pure meta data or possibly bill of materials information with no files behind it? Is it CAD data with internal links and relationships between the files (such as model to drawing links or assembly to sub-assembly and parts links)? Are there more “flat” files that do not have relationships (such as PDFs, JPEGS, Word and Excel files)?
2. Determine how to best extract the data. Obtain access to the data and anticipate extraction issues.
- If the data is in a third party or a legacy system, can we get access to the database?
- Can we read the data in the database? Most legacy system databases are human-readable; typically, we can reverse-engineer them and get the data out. However, sometimes the system is proprietary and needs the involvement of a third party to help retrieve the data.
- Determine any missing data. When dealing with legacy systems that can be 10, 20 years old or more, data may be missing. This could be due to a number of reasons, including a backup and recovery gone wrong, bad data management practices, etc.
- How should we handle these issues? Are we going to ignore missing data? Is the end customer going to create or find the missing data?
- Bring in outside experts if necessary. Just like building a house, one person cannot perform all of the required functions. It makes sense to bring in outside experts to help in specific areas of the migration, or who have expertise in a particular proprietary system being migrated.
3. Define Mapping Rules to lay out how the data will be migrated over to PDM.
An initial mapping of how the data will be migrated should be done at the beginning of the project. It should be noted that as the migration project progresses and end users begin to see and work with the migrated data, most likely there will be changes to how the data is mapped and migrated over.
The mapping exercise usually involves:
- Folder Structure. Where the data or the files will be placed inside SOLIDWORKS PDM.
- How will the metadata be mapped over? What will be the metadata variable mapping rules?
- State mapping. What states and workflows will the data show up in within SOLIDWORKS PDM once migrated? Every file needs to be associated to a workflow and a specific state; what are the rules that determine that?
- Other special logic to apply. For example, in some legacy systems, we’ve seen data where there have been three fields called ‘title one,’ ‘title two’ and ‘title three.’ The logic is to concatenate or merge those three fields together into one field and then migrate that one field into a simple title field in SOLIDWORKS PDM or convert numeric revisions to alpha revisions or vice versa.
4. Estimating Migration Time.
- Develop a good estimate of the data volume that will be migrated. This includes (1) how many files, (2) how many revisions/versions and (3) the number of Projects/Folders to be migrated. If possible, determine the number of links, such as parent to child links or document to document links.
- Per the data estimates, we should be able to produce an estimate of how long it will take to actually run the migration. This is not the time it will take for the full migration project; rather, it is the time it will take to run the migration and load it from the legacy system to the new SOLIDWORKS PDM system. Using rough guidelines, fewer than 100,000 files can be migrated in less than a weekend; 100,000 – 500,000 can be done over a weekend; and 500,000 + will require a long weekend or a delta migration should be considered. (We discuss delta migrations later in this article.)
- Understand what is acceptable down time and develop a high-level cut-over plan.
Usually, such migration projects take place over a weekend, where the legacy system can be turned off on a Friday and turned on the following Monday in SOLIDWORKS PDM. But sometimes, based on the amount of data, it will take longer than a weekend. If the customer is multinational, dealing with different time zones will reduce the migration window. In summary, having a complete understanding of how much data will be migrated will give us an opportunity to estimate the time required and to work with the customer to accommodate the customer’s downtime expectations.
In addition to what is discussed above, if you are moving to a new system, a new PLM system or a PDM system similar to SOLIDWORKS PDM, this is a perfect opportunity to review your business processes. This is a great time to update and configure SOLIDWORKS PDM to ensure it is accommodating your current processes. This is the time to consider what processes are working today and which ones are not, and what is causing unsatisfactory user experiences—and then make the appropriate changes.
Data Cleanups in Migrations
With migrations, there’s always an aspect of data cleanup. Many customers say, “Oh, our data is clean and it’s going to be a smooth migration, nothing to worry about.” But there is always data cleanup, because there is inevitably some amount of messy data to be dealt with.
Half the battle with data cleanup is identifying where the issues are and creating rules to fix them. The following are different areas of “messy” data that we work to identify and clean up:
- Duplicate part numbers pointing to the same file. For instance, you have part number 123 related to file A, and part number 456 that is also related to file A. Are these truly two different part numbers being represented by the same file or duplicate part numbers that represent the same part?
- Duplicate file names. For instance, you have three files all called washer1.sldprt? Do those three files all represent the same washer or do they represent different washers that were created in the system and should all have a unique filename?
- Missing or incorrect file attributes. Do we have data where the part number is missing, the revision is missing or is in a bad format or is there bad metadata?
- Missing or incorrect file associations or links. When dealing with CAD files, we need to ensure the links associated with them are correct and valid. For instance, is the drawing correctly linked to its SOLIDWORKS model or, in the case of SOLIDWORKS assemblies, do they correctly reference their child components. When migrating data from network folders, we see many instances where such links are broken.
- File revision issues. In these cases, we see that revisions associated to data are not in the correct format. For example, PDM will assume that the revision scheme is alpha, though revision in the legacy data have unique values such as ‘999’ or some other non-alpha value. Other cases may involve released data referencing non-released data.
- Missing critical data. Any field that will be required in PDM will need to ensure that data exists for it in the legacy system. If such data is missing, it should be reported and fixed. Key fields to validate are Part Number, Revision and Description.
Once you have located and assessed the data, you can determine the best technology and methodology to migrate the data. We suggest the following options subject to specific customer requirements:
For simple cases, we recommend a Manual Processing – Drag & Drop approach. This is usually best for small sets of data stored on network drives. The pros of this approach are that it is easy to do, requires low to no migration services/cost, automatically builds relationships for CAD files and maps properties to data card variables.
However, a Drag & Drop approach has limitations you must be aware of, namely that only the latest data can be imported; that if properties are not in files, they do not get set on a data card; that it mimics the source folder structure; and that you manually need to check and change state if required.
When we get involved with a migration project, we usually make use of a pre-developed migration tool kit. We break the migration into two parts: an extraction piece that will extract data from the legacy system, and an import piece that will import into SOLIDWORKS PDM. (For the import piece, we frequently make use of the SOLIDWORKS PDM XML Import tool, which we will discuss later.)
Having such a tool kit allows us to handle almost any data loading scenario. Such data migration scenarios usual involve:
- Migration of revision history.
- Dynamic attribute mappings.
- Dynamic state and workflow mapping.
- Dynamic placement of files in folders based on customer specific rules or data structure in a legacy system.
- Event logging.
As mentioned above, one of the key tools that we employ is a tool that is provided by SOLIDWORKS called the SOLIDWORKS PDM XML Migration Tool.
One of the main reasons we use this tool is that it is supported by SOLIDWORKS. We never want to find ourselves in a situation where we migrate data and then there are issues that SOLIDWORKS finds have corrupted the system. If we’re using this tool and it’s supported by SOLIDWORKS, they will stand behind how that data gets loaded in the system. Also, it can help cut down on some of the development efforts that would go into various migration projects.
Even when we use this tool, there are still issues or behaviors that need to be addressed. These include:
- It does not support one-to-many or many-to-one mapping (the ‘title one,’ ‘title two,’ ‘title three’ mapping issue outlined earlier). If this is a requirement, we handle it on the data extraction side.
- No multiple revision schemes, for example Numeric and Alphas (one revision scheme could be one – 99 and the other could be A – Z but there are ways to handle it).
- The tool loads data in batches of 1,000. This is a problem because sometimes a misconfigured rule or a bad file memory can result in a file failure—but in this scenario the entire batch of 1000 files would fail. Hence it is important to make sure the data being extracted meets the rules set up in PDM.
- Slower load times, meaning not all data can be loaded during typical downtimes. However, there are workarounds for this.
Conducting Delta Migrations
When we are conducting a migration project we prefer and generally recommend what we call a Big Bang migration. This means we do the full migration at one time. We shut down the legacy system on Friday, load all the data over the weekend and then come Monday, the customer is using SOLIDWORKS PDM.
That’s the most cost-efficient approach. However, if there’s an extremely large dataset or the downtime period is too short, we can perform what’s we call a Delta Migration.
In that case, we would start the migration of the legacy system into SOLIDWORKS PDM. But while that’s happening, the users are still using the legacy system. That initial migration into PDM might take a week to two weeks or more. Then, over a specified weekend, we would determine the delta from when we did the original migration to what the users created in that one, two or three-week timeframe. Finally, we would load that delta data over the weekend—a much smaller dataset which can be completed during that weekend.
Conducting a Delta Migration is always an option in migration projects. It should be noted that they take longer and are more costly because not only do we need to test and validate the main migration, but we also need to test and validate the Delta Migration portion. It is for this reason that we view a Delta Migration as less than optimal and instead try to accomplish the data loading in a Big Bang migration. This can be done by improving hardware, having SSD drives, increasing the memory, and optimizing the hardware and the disk drives for better input/output. Nonetheless, if such optimization still does not provide the performance required, we have the Delta Migration to fall back on.
PDM Merge: A Special Case for PDM-to-PDM Migrations
Many times, we come across customers that have two separate SOLIDWORKS PDM vaults. Maybe they have two separate business units that set up their own vaults, or one company that is running SOLIDWORKS PDM bought another company that had its own SOLIDWORKS PDM system. This scenario is not unusual. We are often looking at two or more vaults being merged into one.
Why would companies want to merge their vaults?
- Merging may increase collaboration—with one system instead of two, everyone can collaborate together and reduce duplication of parts.
- This collaborative single-system environment can reduce IT overhead by reducing the number of servers required to run PDM.
PDM merge projects are essentially a migration project because we are merging or migrating data from one PDM system into another. PDM merges can have the same complexity of a data migration project.
Similar to migrations, half the battle with mergers is developing the business rules of how those merge vaults will behave. If the vaults are behaving differently—e.g., they have different workflows, different revision schemes, different folder structures and different security models—we have to address these issues. Sometimes the rules to merge can be relatively simple; other times they can get quite complex as we are taking different PDM systems that were designed for different business processes and merging not only the data but the business process together.
We have tools and mechanisms to automate the process. As consultants, we will facilitate the discussions of how vaults should be merged, and also execute the actual merging process. Some of the discussions that need to happen with your consultant include:
- Revision mapping. Will the systems use the same revision scheme or different ones?
- What are the workflow mappings between the different vaults? For instance, do the different vaults have different engineering change processes, and will one be used overall or will the system support different change process workflows?
- Do we need to deal with duplicate file names or files that are existing in both systems that represent the same file?
- Should we merge the latest data only, the full history, or the latest in work and all released?
- Do the vaults being merged make use of toolbox or design libraries? Are they the same or duplicate across the vaults?
Testing the Migration Process
In our opinion, testing the migrations is the most important step in migration projects. No migration is performed successfully the first time.
Expect to discover issues and then mitigate them. There could be new issues you find with your legacy data. You may first complete all your data cleanup, then load the data into the system and all of sudden you realize that you have more undetected legacy issues that need to be fixed. This could be a result of bad links, blank fields or many other issues.
You may realize some of your migration rules are bad. Consider as a simple example that you set up in PDM a field called ‘cost’ and you set it to an integer field. But you then discover that cost information in your legacy data has characters in it—maybe it has a currency value or just text ($100 written out in text versus ‘100’). In such cases, fixes may be as easy as changing the field in PDM from integer to text or cleaning up legacy data and removing any non-integer values from the field.
Another issue may arise around the data loading procedural rules. For example, frequently we need to set who created the data, who released the data, who changed the data. Those values are based on users or usernames that are in the system. The procedure in many cases is to make sure that the users get loaded first and then we can load data that may reference that user information. There could be scenarios where that procedure is off, and you attempt to load a certain set of data first that is dependent on a different set of data and then you have issues. Proper sequencing of data loading is critical and knowing how to establish the sequence takes experience.
Sometimes we must run some post or pre-processing cleanup, do extractions and then do imports and then do post-processing cleanup. It is critical to make sure that the proper process is in place and the procedures are followed in the correct order.
Whatever errors you encounter, it is important to determine where the error is coming from and the appropriate way to fix it. One key aspect is to document your migration procedure, all your steps, and to automate it as much as possible. If you create too many manual steps, they can be prone to issues and human mistakes.
We recommend executing three to four test migrations before you conduct the production migration. There is no point in conducting a production migration if you know you have issues with your migration tools. You don’t want to load bad data into the system if it will not come into the new system correctly. Migrations take time and effort. Accordingly, make sure you have properly validated the data, you are employing the proper tools and you have determined the optimal sequencing before you push the final GO button. Be prepared to conduct multiple tests and multiple test migrations.
It’s also best practices to have a test plan, to document what you want to test and to carefully review the test data, and review reports, logs and so forth. It is absolutely critical to have the end customer also review the test migrations and test data to validate the efficacy of the migration. After all, it is the customer who best knows the data and can most quickly discern if the migration has worked or if there are substantial errors that need to be corrected.
The testing process takes a substantial amount of the project time. The components of this process are the test migration, validation, finding issues and making fixes, then rerunning the migration.
A lot of times, the customer may design the system how they think they want it to behave—but it isn’t until the migration project begins that they first understand how the real data interacts with the system. They may want the data to behave a different way. This is also an iterative process designing how the PDM system works.
The key point is that any changes to the PDM data model creates the need to re-test and re-evaluate the testing—and test the migration as well.
Pre-Data Loading Planning
Once we get to the place where we know our migration process has worked well and the test migrations look good, we need to start doing some pre-data loading planning. This also involves important questions such as determining when that legacy system will be turned off; what is the migration weekend; how long it will take to migrate the data so we will can discuss downtime expectations with the customer; can it be done during non-business hours or in an evening, or maybe it can be done during business hours?
We want to specify the data loading process. We need to identify and document all the required pre-data loading activities. For example, make the legacy system read-only and send out emails to the end-users that the system is going to be switched off. We should have a good document of what those data loading sequences are.
Once that’s in place, we identify the various people responsible for the different parts of the data loading process. That could be internal or end customer IT personnel that need to make systems read-only or export databases. It could be key migration specialists that probably will be doing most of the migration. It could be someone that needs to go in and validate the migration process before the final go-live. This means that an essential component of a successful migration is making sure every participant is identified, is aware of his/her general role and knows what their specific responsibilities are during that cutover weekend.
Final Data Load
We always say with proper planning and testing, the final data loading should go smoothly. Make sure you follow your documented procedures but allow a little bit of extra buffer time for unexpected issues. Be ready for Murphy’s Law—the network could go down, or someone created a file at the last minute that might cause an issue. Always expect the unexpected. Lastly, be prepared to support the users in the new system come Monday morning.
It is very important to define the business requirements and the migration, integration merge rules, in defining the project. These requirements are more of a business decision and many times prove to be more difficult than the technical solution required to accomplish the result. With proper planning and testing, however, the project should go smoothly. We have performed hundreds of migrations and we have encountered a huge variety of environments and issues. We often find that what our customer thinks is extremely difficult, we can easily handle…but what the customer thinks is easy turns out to be a significant challenge. Moving data from system A to system B while appearing to be routine rarely is and every migration has to be customized to the specific situation encountered.
To learn more about the benefits of data management in SOLIDWORKS, check out the whitepaper Gain Competitive Advantage with Product Data Management.
About the Author
Marc Young is the President and Owner at xLM Solutions. Marc brings over 20 years of product lifecycle management (PLM) experience to the process of solving customers’ problems. He is an expert in all aspects of providing PLM solutions and services. During the course of his career, Marc has designed, integrated and expedited technology solutions for both large and small engineering and manufacturing companies seeking to improve their business processes. Marc has managed and executed countess projects around PLM such as consulting on PLM best practices at companies, developing tools and methodologies for data migrations, creating customizations to increase customer efficiency and complete PLM/PDM integration projects.