Contrasting Classical and Symbolic Approaches to Classification in Data Warehousing Systems: Application to Metabolic Health Profiling
DOI:
https://doi.org/10.5281/zenodo.15657368Keywords:
Complex data, Tuning, Data warehouse, Symbolic data analysis, Clustering.Abstract
In the era of Big Data, the diversity and volume of data in enterprise data warehouses pose significant challenges for accurate and robust classification. Data mining encompassing classification, clustering, and association-rule techniques serves as a cornerstone for uncovering actionable insights from these vast repositories. Traditional analytic paradigms, however, often falter when confronted with real-world imperfections, such as multi-valued attributes, measurement imprecision, and aggregated summaries. Symbolic Data Analysis (SDA) addresses these shortcomings by representing entities as complex “symbolic objects” (e.g., intervals, distributions, or sets), thereby preserving inherent variability and uncertainty. A data warehouse based on medical parameters (BMI, blood glucose, etc.) was constructed and analyzed using both approaches. In this work, we first develop a star‑schema data warehouse tailored to a representative application domain and implement a robust ETL pipeline to ensure data consistency and integrity. We then apply both classical classification algorithms (e.g., k‑means, decision trees, and support vector machines) and a novel symbolic dynamic classification framework where class prototypes are defined as hyperrectangular envelopes of feature intervals to the same datasets. Our evaluation demonstrates that symbolic approaches excel in handling data imprecision and provide richer interpretability, while classical methods remain computationally efficient. The results are validated using accuracy, inertia-based metrics, and clinical interpretability, offering actionable insights for data warehousing applications.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.