Music Genre Classification
Machine Learning | Audio Processing | Deep Learning | CNNs
This project focuses on building a model that classifies songs into musical genres based on their audio characteristics. Using deep learning and signal processing techniques, it demonstrates how raw sound data can be transformed into meaningful predictions.
Overview
To automatically classify music tracks into genres (e.g., rock, classical, hip-hop) using spectrogram-based feature representations and deep learning models, especially Convolutional Neural Networks (CNNs).
Goal
What I Did
Data Preprocessing
Used the GTZAN genre dataset, a benchmark dataset in audio ML.
Converted raw .wav files into Mel spectrograms, which visually represent audio frequency content over time.
Feature Representation
Treated mel spectrograms as grayscale images and resized them into consistent 128x128 formats for CNN input.


Generating Mel spectrograms
Model Development
Designed and trained a custom CNN to classify genres based on visual patterns in the spectrograms.
Applied dropout and normalization to reduce overfitting and stabilize training


Custom CNN Architecture
Evaluation
Validation was used during training.
Measured performance using Accuracy, Precision, Recall, F1-score, and confusion matrices.
Visualized predictions and genre-wise misclassifications to interpret the model’s behavior.
Music Classification Confusion Matrix


Performance Summary
Best Test Accuracy: 57.16% (Hip-Hop, Instrumental, International, and Pop genres)
Test Loss (Cross Entropy): ~1.43
Precision & Recall (Genre-specific):
Class 2 (Hip-Hop):
Precision: 51.5%
Recall: 77.0%
F1-score: 61.7%
Class 3 (Instrumental):
Precision: 80.7%
Recall: 60.8%
F1-score: 69.3%
Takeaways
Temporal features like delta and delta² helped the CNN capture subtle changes in audio textures.
Higher learning rates (0.01) led to faster convergence and better test accuracy compared to 0.001.
Genre confusion revealed musical overlaps—e.g., electronic and rock often misclassified due to similar instrumentation.
Tools and Technologies Used
Python, NumPy, Matplotlib
Librosa (audio processing)
TensorFlow / Keras (deep learning)
Scikit-Learn (classification metrics)
GTZAN Dataset
This project was completed as part of the course "DS-4213: Data Mining," taught by Dr. Bo Hui at the University of Tulsa in the spring 2025 semester.
My full code for this project can be found here on my GitHub.