What You Should Know about Data Mining
Data mining is a tool to guide the speedier development of better products.
The modern world, design engineers included, is awash in data. Rather than be swept away in this tidal wave, more forward-thinking product developers are beginning to pay attention to how data mining tools can be used to their advantage.
What is data mining? A standard definition is the nontrivial extraction of implicit, previously unknown and potentially useful knowledge from data. Another definition is a variety of techniques used to identify nuggets of information or decision-making knowledge in bodies of data. For the design engineer, data mining is a potential tool to guide the development of better products and a way to get at these new product specifications faster.
Data that is relevant to the design engineering process may come from many sources. Microprocessor-controlled machines generate much data in their normal course of operation. Many manufacturing processes similarly create data that is used to monitor quality, throughput, etc. Repetitive cycles of prototype testing or simulations of complex products also have potential to create vast quantities of data. For example, more indirect sources of data might come from marketing department-driven studies of customer satisfaction. Whatever the source, the promise of data mining is to discover new information on product function that is not available from a more routine statistical analysis to guide the engineering of a better machine and to get to these product definitions faster.
In machine design, there are both hardware and software design issues. Data mining can speed things up in two ways—first, analysis of operating data, including failures, on existing machines can reveal potential improvements in terms of quality, uptime and throughput. Secondly, most computerized machinery has a significant component of control software. This means that as operating data is analyzed, even after prototype and production machines are built, data mining results can more quickly identify the need for software changes.
David Harris, president of David Harris Design Associates (Chicago, IL), which specializes in fast-track product development, describes the opportunity that data mining provided to his group's design of a motion-controlled operating room table, "Previously, we just had a hydraulic machine. When we added robotic control, we saw an immediate benefit. Unlike its dumb predecessor, the robotically-controlled machine was generating operational data. With data mining tools, we were able to work with this data and mine it for a great deal of valuable information, such as how to develop an optimized service protocol and research actual use in creating the specification for the next generation product."
"This is by no means a unique example. While machine development is dependent on many variables, the one that is often overlooked is the quality of the design specification. These documents usually are the end result to applying collective wisdom to an initial specification, averaged from a list of drivers that may include the following:
- Old machine specifications
- Competitive machine specifications
- Customer requests
- Service insights
- New technology capabilities
While most design engineers will rely on mathematical experts to implement data mining routines, a general understanding of how data mining proceeds will empower a product development team to rapidly guide data mining exercises to meaningful economic results.
Generally speaking, data mining protocols include three discrete stages:
1. Data cleaning
2. Data transformation
3. Model building.
The glossary below defines many of the terms used in data mining that are important for machine designers to know.





