The term, data mining means to extract a small part of raw data from different sites to create whole new data that can be used and consumed. The other name for Data mining is “KDD”. KDD stands for “Knowledge Discovery in Data”. There is a lot of software that is used for the process of Data mining (Tan, Steinbach and Kumar, 2016). Data mining also helps in making the business more efficient and beneficial. The companies get to know more information about their customers, and then they can manage and customize their sailing strategies. There are different stages and types of data mining (Roiger, 2017). Some common types are text mining, media mining, web mining, etc.
There are different kinds of raw data which is collected for data mining. This data turns out to be very helpful for different data mining companies (Sammut and Webb, 2017). Some of the most common types are for business transactions, scientific data, games, satellite data, etc. Data mining is not performed with only a specific single type of data. But for every different kind of data, there is a different procedure (Torgo, 2016). Some examples are:
Flat files carry data that is in simple and normal text or binary structure or format (Shu, Sliva, Tang and Liu, 2017). It has its way to generate data and make it useful. In flat files, the data can be any scientific measurement, transaction, and much more. One of the most common sources of data mining is flat files (Dutt, Ismail and Herawan, 2017). The data in the flat files are easily processed with the help of data mining algorithms.
The relational database contains all the columns and rows. In simple words, a relational database consists of tables. These tables contain different values. These values can be of entity attributes or the values generated from entity attributes. There are rows and columns in these tables. The rows carry all the ordered list of elements present in the table. These sequenced lists are also known as tuples. On the other hand, the columns in the table represent the characteristics of the table or the attributes. There are different languages used for generating data through a relational database. The most commonly used is SQL. This language helps the user to perform different arithmetic operations, such as sums, count, max, etc. A relational database can also perform many complexed problems. For instance, it can solve derivations, or predict and compare them.
A data warehouse is a storage place for all the data collected. This data can be used anytime anywhere, as the user wants. This data is not collected from a specific single site or source. It is a combination of different types of data from different sources. The user can analyze different types of data with the help of a data warehouse. A data warehouse is also an important part of the field of business intelligence.
Multimedia data contains all the data which is in the form of texts, video, audio, pictures, or images. This data can easily be stored by multiple methods. Such as on an object-oriented database, or a file system. Data mining becomes more challenging and tough because of high dimensioned data, such as audio, video, and images. The analysis of multimedia data usually requires computer graphics and vision, etc. Multimedia data is classified in various applications, which are, repository applications, presentation applications, etc.
All the data present on the website, which is related to time is called a time-series database. This data can be some logged data or any data on the stock market. The time-series database is a kind of data, which keeps coming into the user (Kavakiotis, Tsave, Salifoglou, Maglaveras, Vlahavas and Chouvarda, 2017). It keeps changing almost every second. That is the reason, time-series data gets challenging for the user. Because of that, there can be a need for some tough real-time analysis. The evolution, concepts, and the relation between some variables are present in the data mining of time series database. It also includes predictions about the changing and relation of the different variables. Usually, time-series data is used for generating or storing different kinds of information for decisions of long-term business.
All over, data mining seems to be beneficial, but they do have some issues and problems. These problems are about different sorts. Such as security issues, methodology problems, and much more. But there are people and companies, who are in search of solutions to these issues.
In data mining, a lot of sensitive data is generated from unprotected sites. All of the data which is collected to make customer profiles, or for some personal reason, is private and not highly protected (Ristoski and Paulheim, 2016). That information can be about some individual or any company, and using it for any other purpose can be a huge risk for the user. Another issue with data mining is its usage procedure. The data, the user is using can be infected or harmful for an individual’s device. The user’s device, he is working on can get hacked.
There are a lot of methods to deal with the procedure of data mining. Many procedures that are related to artificial intelligence can handle a specific amount of data. In data mining, the data is managed in terabytes. And even more than that. So, sometimes the amount of data is too high for the application or the generator (Fournier-Viger, Lin, Gueniche, Soltani, Deng and Lam, 2016). So, this is how data mining has performance issues.
When huge data is collected or generated, with the help of small pieces of various parts of data taken from different sites, this procedure is known as data mining. There are a lot of different procedures and programs used for data mining. Some common examples of data mining are flat files, relational databases, data warehouses, multimedia, time-series data, and much more. With all these data mining benefits, there are some issues with it too (Asif, Merceron, Ali and Haider, 2017). Issues like security issues, performance issues, issues with sources of data, etc.
- Tan, P.N., Steinbach, M. and Kumar, V., 2016. Introduction to data mining. Pearson Education India.
- Roiger, R.J., 2017. Data mining: a tutorial-based primer. CRC press.
- Sammut, C. and Webb, G.I., 2017. Encyclopedia of machine learning and data mining. Springer.
- Torgo, L., 2016. Data mining with R: learning with case studies. CRC press.
- Shu, K., Sliva, A., Wang, S., Tang, J. and Liu, H., 2017. Fake news detection on social media: A data mining perspective. ACM SIGKDD explorations newsletter, 19(1), pp.22-36.
- Dutt, A., Ismail, M.A. and Herawan, T., 2017. A systematic review on educational data mining. Ieee Access, 5, pp.15991-16005.
- Kavakiotis, I., Tsave, O., Salifoglou, A., Maglaveras, N., Vlahavas, I. and Chouvarda, I., 2017. Machine learning and data mining methods in diabetes research. Computational and structural biotechnology journal, 15, pp.104-116.
- Ristoski, P. and Paulheim, H., 2016, October. Rdf2vec: Rdf graph embeddings for data mining. In International Semantic Web Conference (pp. 498-514). Springer, Cham.
- Fournier-Viger, P., Lin, J.C.W., Gomariz, A., Gueniche, T., Soltani, A., Deng, Z. and Lam, H.T., 2016, September. The SPMF open-source data mining library version 2. In Joint European conference on machine learning and knowledge discovery in databases (pp. 36-40). Springer, Cham.
- Asif, R., Merceron, A., Ali, S.A. and Haider, N.G., 2017. Analyzing undergraduate students’ performance using educational data mining. Computers & Education, 113, pp.177-194.