Mastering SQL with Real-World Data Analysis: Famous Painting Dataset Project

Comprehensive Data Analysis on Famous Paintings Using SQL

If you are a data enthusiast or someone interested in art, you’ve probably wondered how SQL can help analyze art datasets. Today, I’m walking you through a fascinating SQL project based on Kaggle’s Famous Paintings dataset, showcasing real-world queries and insights. This blog will answer key questions like the popularity of painting styles, museum operations, and artist contributions using SQL queries. Let’s dive in!

Dataset Overview

The Famous Paintings Dataset on Kaggle provides data about:

  • Paintings and their attributes (style, price, size).
  • Museums showcasing these paintings.
  • Artists and their works.
  • Operational details like museum hours and location.

Key SQL Queries for Data Insights

Below are the SQL queries highlights that answered some analytical questions, along with explanations and outputs.

1. Which Paintings Are Not Displayed in Any Museum?

This query identifies paintings that are currently not displayed, enabling museums to improve their curation strategies.
SELECT DISTINCT name 
FROM work 
WHERE museum_id IS NULL;

2. Most Popular Artists and Museums

This ranks artists based on the number of paintings displayed, providing insights into global art trends.
SELECT t1.full_name, COUNT(t2.artist_id) AS count
FROM artist t1
LEFT JOIN work t2 ON t1.artist_id = t2.artist_id
GROUP BY t1.full_name
ORDER BY count DESC
LIMIT 5;

3. Paintings with Asking Prices Higher Than Regular Prices

Analyze pricing anomalies for valuable insights into the art market.
SELECT COUNT(*) AS cnt 
FROM product_size
WHERE sale_price > regular_price;

4. Canvas Size That Costs the Most

Ever wondered which canvas size fetches the highest price?
SELECT cs.label AS canvas_size, ps.sale_price
FROM canvas_size cs 
LEFT JOIN product_size ps 
ON cs.size_id = ps.size_id
WHERE ps.sale_price = (
    SELECT MAX(sale_price) 
    FROM product_size
);

5. Remove duplicate entries:


WITH CTE1 AS (
    SELECT artist_id, work_id, 
           ROW_NUMBER() OVER (PARTITION BY artist_id, work_id) AS rn 
    FROM work
)
SELECT * 
FROM work 
WHERE work_id IN (
    SELECT work_id 
    FROM CTE1 
    WHERE rn > 1
);

Explore my GitHub Repository for detailed SQL scripts, and don’t forget to check out my LinkedIn Post for a quick summary and networking opportunities!

Concluding Thoughts

This project beautifully combines art and analytics, providing insights into museums, artists, and painting trends. If you’re keen on art or want to explore SQL for real-world projects, this dataset is a great place to start.

Get Involved

Ready to analyze the art world? Head over to Kaggle, download the dataset, and try these queries yourself!

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top