SQL coding challenge #1: Top 10 customers, by total payments amount.

Before getting into the technical, I want to say “Hi”. If you are new to reading my articles, welcome, and I hope this will be useful and/or fun for you. For all readers, if you like this article or you found it useful, please CLAP! It will let me know you read it, and it’s a great way to connect. Even better, leave a comment. I always make sure to respond when I see it.

Here’s the coding challenge, on CodeWars, if you want to give it a try on your own or code along: click here. Coding challenges are a fun and efficient way to improve on your coding skills, and today we’re tackling SQL.

Problem: Top 10 customers, by total payments amount.

Your are working for a company that wants to reward its top 10 customers with a free gift. You have been asked to generate a simple report that returns the top 10 customers by total amount spent ordered from highest to lowest. Total number of payments has also been requested.

The query should output the following columns:
- customer_id [int4]
- email [varchar]
- payments_count [int]
- total_amount [float]

and has the following requirements:
- only returns the 10 top customers, ordered by total amount spent from highest to lowest

Database Schema:

Code:

SELECT
c.customer_id,
SUM(CAST(p.amount as float)) AS total_amount,
c.email,
COUNT(p.amount) AS payments_count
FROM
customer c INNER JOIN payment p
ON c.customer_id=p.customer_id
GROUP BY c.customer_id
ORDER BY total_amount desc
LIMIT 10;

Code explained:

If you are a beginner, the SQL above may look intimidating. But let me assure you, I myself did not just write that from the beginning to end, the way you see it here. I built it, starting with something very simple, such as selecting columns from the tables. Let’s run through how you can build it yourself, by thinking about what the problem is asking us to do, and following the logic of the problem step by step, then translating that into SQL code.

The query should output the following columns:
- customer_id [int4]
- email [varchar]
- payments_count [int]
- total_amount [float]

Let’s start by determining which columns we need and what tables they are in, and how these tables are related to each other. The problem asks us for customer_id. This column is in the customer table, check the database schema above. It is also asking us for the email, which is also in the customer table. Next, the problem is asking us to get the payments_count and the total_amount. If we check the schema, these columns do not exist. Looking closer at the problem details, we see that these are totals calculated from the customers’ payments, which are captured in the payment table and the amount column in this table. For each customer, we can get the total amount and the total count by adding the amounts together for the total_amount, and counting the number of amounts or payments for the payments_count. When we start adding or counting things for each customer we can start thinking of aggregation, which means we are doing addition and counting for each customer to get one result.

So, looking at our thought process above, we see that we will need the following columns/tables:

  • customer_id from customer
  • email from customer
  • amount from payment

Now that we know which columns we need and what tables we will be using, we need to think about what is the common element that makes these tables relate to each other, what column do they have in common… and se can see, from the database schema, that it is customer_id. When we think about bringing two tables together through a common element we can start thinking of joining them together on their common element, specifically an inner join with the condition that customer_id is a match on both.

So, let’s write a simple select with what we know so far, to get the columns we need from their respective tables:

SELECT customer_id, email
FROM customer;

SELECT amount
FROM payment;

Looking at the rows our select code returned above, we are getting the columns we want, however they are still separate results. Next, we need to join these two results together by matching on the customer_id, from both tables, customer and payment. So, let’s add to our basic code and join customer and payment on customer_id. We can also limit our output to ten records by using LIMIT.

SELECT customer.customer_id, customer.email, payment.amount
FROM customer
INNER JOIN payment
ON customer.customer_id=payment.customer_id
LIMIT 10;

Now we are seeing a result that is closer to what we are looking for. Remember, we want customer_id, email, total_amount, payments_count. We have the first two, customer_id and email. To get the last two, we need to calculate them from the amount. For each customer, so for all rows with the same customer_id, we need to add up all the amounts to get the total_amount for the customer. Similarly, for each customer, so for all rows with the same customer_id, we need to count all the amounts, i.e. rows, to the the payments_count. We are ready to add another element to our SQL code above, and that is aggregation. In SQL we can apply an operation like SUM or COUNT to a set of rows that have the same value for a column by using GROUP BY. Using the output above, let’s translate what that means for the customer with customer_id 1. Basically, we are taking all the rows where the customer_id is 1, and then we get the total_amount by SUMing all the amount values for those rows together. Similarly, we are taking all the rows where the customer_id is 1, and then we get the payments_count by COUNTing how many rows there are. Now, let’s code this!

SELECT customer.customer_id, customer.email,
SUM(payment.amount) AS total_amount,
COUNT(payment.amount) AS payments_total
FROM customer
INNER JOIN payment
ON customer.customer_id=payment.customer_id
GROUP BY customer.customer_id
LIMIT 10;

It’s starting to look closer to our end goal. Let’s remember what we’re going for here…

The query should output the following columns:
- customer_id [int4]
- email [varchar]
- payments_count [int]
- total_amount [float]

and has the following requirements:
- only returns the 10 top customers, ordered by total amount spent from highest to lowest

We are getting the right column back, customer_id, email, payments_count, and total_amount. The instructions also say to return only the top 10 customers, ordered by total amount from highest to lowest, meaning descending order. In SQL, we use ORDER BY to order rows, followed by the column we want order by, in our case it would be total_amount.

Notice, I used the wrong name for the payments count variable, payments_total instead of the payments_count, so let’s make sure we use the correct variable in the next version of our code :-). It’s good to experience mistakes, and learn to catch and correct them.

Let’s develop our code a little more, by adding the ORDER BY clause.

SELECT customer.customer_id, customer.email,
COUNT(payment.amount) AS payments_count,
SUM(payment.amount) AS total_amount
FROM customer
INNER JOIN payment
ON customer.customer_id=payment.customer_id
GROUP BY customer.customer_id
ORDER BY total_amount DESC
LIMIT 10;

We have all the columns, as requested in the problem. And our result is showing the top 10 rows with the highest total_amount. Looks like we’re done… except that we are not :-). The coding challenge on CodeWars is still returning one little issue with the code, and it is subtle, but it is a requirement that is present in the instructions.

The query should output the following columns:
- customer_id [int4]
- email [varchar]
- payments_count [int]
- total_amount [float]

The variable/column total_amount must be a float, and right now our code is not returning it in the appropriate format. We can fix this by casting to float, using the CAST function in SQL. Let’s add that to our code above!

SELECT customer.customer_id, customer.email,
COUNT(payment.amount) AS payments_count,
SUM(CAST(payment.amount AS FLOAT)) AS total_amount
FROM customer
INNER JOIN payment
ON customer.customer_id=payment.customer_id
GROUP BY customer.customer_id
ORDER BY total_amount DESC
LIMIT 10;

And that’s it! There are no more issues, and we have written code that meets all the requirements of the problem stated. And here it is, in all of it’s glory!

SELECT c.customer_id, c.email,
COUNT(p.amount) as payments_count,
SUM(CAST(p.amount AS FLOAT)) as total_amount
FROM customer c
INNER JOIN payment p
ON c.customer_id=p.customer_id
GROUP BY c.customer_id
ORDER BY total_amount DESC
LIMIT 10;

There is one change I made, and that is I nicknamed, i.e. aliased, my tables as c for customer and p for payment. It makes the code less cluttered and easier to read.

If you like this article or you found it useful, please CLAP! It will let me know you read it, and it’s a great way to connect with my readers. Even better, leave a comment. I always make sure to respond when I see it. Until next time, Happy Learning!

I am a Data Scientist with 10+ years experience in the Tech Industry, and a background in Education and Customer Service.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store