Nova School of Business and Economics
Abstract
This study addresses critical industrial challenges in e-commerce product
categorization, namely platform heterogeneity and the structural limitations of
existing taxonomies, by developing and deploying a multimodal hierarchical
classification framework. Using a dataset of 271,700 products from 40
international fashion e-commerce platforms, we integrate textual features
(RoBERTa), visual features (ViT), and joint vision--language representations
(CLIP). We investigate fusion strategies, including early, late, and
attention-based fusion within a hierarchical architecture enhanced by dynamic
masking to ensure taxonomic consistency. Results show that CLIP embeddings
combined via an MLP-based late-fusion strategy achieve the highest hierarchical
F1 (98.59\%), outperforming unimodal baselines. To address shallow or
inconsistent categories, we further introduce a self-supervised ``product
recategorization'' pipeline using SimCLR, UMAP, and cascade clustering, which
discovered new, fine-grained categories (e.g., subtypes of ``Shoes'') with
cluster purities above 86\%. Cross-platform experiments reveal a
deployment-relevant trade-off: complex late-fusion methods maximize accuracy
with diverse training data, while simpler early-fusion methods generalize more
effectively to unseen platforms. Finally, we demonstrate the framework's
industrial scalability through deployment in EURWEB's commercial transaction
intelligence platform via a two-stage inference pipeline, combining a
lightweight RoBERTa stage with a GPU--accelerated multimodal stage to balance
cost and accuracy.
Abstract
The demand for text classification is growing significantly in web searching,
data mining, web ranking, recommendation systems, and so many other fields of
information and technology. This paper illustrates the text classification
process on different datasets using some standard supervised machine learning
techniques. Text documents can be classified through various kinds of
classifiers. Labeled text documents are used to classify the text in supervised
classifications. This paper applies these classifiers on different kinds of
labeled documents and measures the accuracy of the classifiers. An Artificial
Neural Network (ANN) model using Back Propagation Network (BPN) is used with
several other models to create an independent platform for labeled and
supervised text classification process. An existing benchmark approach is used
to analyze the performance of classification using labeled documents.
Experimental analysis on real data reveals which model works well in terms of
classification accuracy.