Ulster University Logo

Transformation of XML Data Sources for Sequential Path Mining

McNerlan, Ruth, Bi, Yaxin, Zhao, Gouge and Hang, Bing (2017) Transformation of XML Data Sources for Sequential Path Mining. In: International workshop on graph data management and analysis (GDMA 2017), Beijing, China. Lecture Notes of Computer Science, Springer. 10 pp. [Conference contribution]

[img] Text - Accepted Version
714kB
[img] Text - Supplemental Material
Indefinitely restricted to Repository staff only.

94kB

Abstract

In recent years XML has become one of the most promising ways to define semi-structured data. Data mining techniques devised for detecting interesting patterns from semi-structure data have also grown in popularity, but carrying out such techniques on XML data can be problematic due to its hierarchical structure. Therefore, it has become necessary to transform XML into flattened, path data, so as to enable data mining to be carried out efficiently. However, problems may arise when the XML tree needs to be reconstructed from the traversal path. There are currently many transformation techniques for XML data, many of which take advantage of its tree-like hierarchical structure; but most of these approaches do not allow the XML tree to be reconstructed from the traversal path. In this paper we propose a new approach to the transformation of XML data into path data. The new approach employs a 5 step transformation process along with a new ‘Postorder Sequencing’ method of traversing the XML tree. The proposed method, on the one hand, can be seen an efficient and effective way of transforming XML data into collections of paths, and on the other hand enables XML trees to be generated from the traversal paths

Item Type:Conference contribution (Paper)
Keywords:XML, Transformation, XPath, Sequential Data Mining
Faculties and Schools:Faculty of Computing & Engineering
Faculty of Computing & Engineering > School of Computing and Mathematics
Research Institutes and Groups:Computer Science Research Institute
Computer Science Research Institute > Artificial Intelligence and Applications
ID Code:38942
Deposited By: Dr Yaxin Bi
Deposited On:03 Nov 2017 14:05
Last Modified:03 Nov 2017 14:05

Repository Staff Only: item control page