The large amount of data is highly demanding hardware resources and time consuming. For some traditional data mining algorithms, machine learning algorithms and data profiling tasks, it is very difficult to handle such a large amount of data. On the other hand, the 5V characteristic of big data, especially Volume which means large amount of data, brings challenges to storage and processing. On the one hand, we can analyze and mine big data to discover hidden information and get more potential value. Big data brings us new opportunities and challenges. We compare the performance of HoPF with two baseline approaches that both assume the existence of primary keys.ĭue to the development of internet technology and computer science, data is exploding at an exponential rate. The results show that our method is able to retrieve on average 88% of all primary keys, and 91% of all foreign keys. We evaluate precision and recall on three benchmarks and two real-world datasets. Several pruning rules are employed to speed up the procedure. Using score functions, our approach is able to effectively extract the true PKs and FKs from the vast sets of valid UCCs and INDs. PKs and FKs are subsets of the sets of unique column combinations (UCCs) and inclusion dependencies (INDs), respectively, for which efficient discovery algorithms are known. We study the problem of discovering primary keys and foreign keys automatically and propose an algorithm to detect both, namely Holistic Primary Key and Foreign Key Detection (HoPF). Detecting them manually is time-consuming and even infeasible in large-scale datasets. However, in many cases, these constraints are unknown or not documented.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |