Tom Kral

Identifying Preferences in the Clientele of a Coffeehouse

Analyses of sales data from a local business to identify patterns in items sold.

Project details

  • Techniques: Python, Market Basket Analysis, Visualization
  • Completed: Ongoing project
  • Github: View the repository
A local barista, Cocoon Coffee & More, commissioned me to analyze their sales data. Per request, I visualized the overall sales per quarter and the sales of cocktails in the summer of 2021. This gave the client gained more insights in sales than their current cash register system can provide. I also performed a Market Basket Analysis to see the products that are often bought together. This indicated that customers that buy pastries, probably also buy cappuccino or coffee. Based on these findings, I'm currently researching the type of pastry that has the strongest link.

Project Requirements

  • Assess the state and form of the data.
  • Visualize sales per quarter, year and category.
  • Visualize cocktail sales in the summer of 2021.
  • Perform a Market Basket Analysis.
  • Present the results in a understandable and usable manner to the client.

Project Overview

The owner of Cocoon Coffee & More has asked me to analyze his sales since he started the company in 2019. The current cash register system provides rudimentary insights in sales, but this is not always sufficient. Certain questions remain unanswered and an analysis with Python can change that. This project is also meant to gain familiarity with (big) data practices for a smaller company. It was my task to convey the advantages of this approach and, above all, answer the outstanding questions.

  1. What are the bestselling products per category, per quarter and per year?
  2. How well did the different cocktails sell in 2021?
  3. Which products are often bought together?
I used data from receipts as a starting point. These can be exported from the cash register system as CSV files. There are 45583 receipts in total, and these span the period between 10-05-2019 and 20-05-2022. I used a separate CSV file, also exported from this platform, detailing the different categories and corresponding products. The combination between these two datasets was used to perform the analyses on. Finally, I presented my findings to the client with a report and presentation. After this, he could decide if more research is needed, more advanced analyses have no added value to the business or if the results are satisfactory.

The Challenges

Some data wrangling was needed before analyses could be performed. The notation of bought products on the receipts, in raw form, was not suitable for counting and analysis right away. These had to be adjusted so Python could make sense of the different items and the amounts. Secondly, not all products had a clear category. About 15.7% of products had no category at all which makes subsequent analyses a little less accurate. This was handled by giving these products a label, indicating no group was assigned. Improving annotation of products was one of the recommendations given to the client.
A final challenge was presenting results in way that is easy to understand for the client. Results and technical terms must be conveyed in a way that is easy to read but doesn't leave valuable information out. I documented the outcomes of the analyses in a report and presented this to the client. Results are accompanied by an explanation of the important metrics so he has actionable insights about his business.

The Approach & Solution

The analyses themselves were performed with Python, and every research question is handled by a different script. This project is an introduction to data analyses at this point. More permanent solutions can be discussed later. Reading, adjusting and visualizing the data are all done by Python and its packages. Visualizing the data is done with Seaborn library1, which is an add-on for Python. For visualizing the quarterly sales, I choose a faceted grid2 with bar charts to show multiple categories at once while maintaining readability. I used the same approach for the cocktail sales, but used a time series instead.

overall sales
Overall sales per category in the first quarter of 2022
cocktail sales
Sales of cocktails in 2021

The Market Basket Analysis was done with the MLxtend library3, utilizing the apriori algorithm4 in combination with Python. This analyzes all receipts and returns association rules. These indicate which items are frequently bought together. It's also possible to deduce a likelihood of these products being bought in the same transaction. Observations like these can help with identifying keystone products or with product placement for example. The most prominent association rules are as follows.

Association Rule Support Confidence Lift
Cappuccino ⇒ Pastry 0.088 0.240 1.271
Pastry ⇒ Cappuccino 0.088 0.467 1.271
Coffee ⇒ Pastry 0.056 0.238 1.259
Pastry ⇒ Coffee 0.056 0.296 1.259
Coffee ⇒ Cappuccino 0.087 0.372 1.014
Cappuccino ⇒ Coffee 0.087 0.238 1.014

The Results

When the analyses were complete, the results were presented to the client with a printed report and a presentation on site. Large graphs were also delivered as files. Knowing which products sell well helps the client with decision making for the business. This gives insight in bestsellers and sales of seasonal specials like cocktails. The Market Basket Analysis indicates that sales of pastries and cappuccino and coffee are associated with each other. Lift values higher than 1 implies a link between the products on either side of the arrow. The Support value indicates the percentage of transactions that included the combination. However, for each combination the Confidence score differs between mirrored rules. This is the proportion of transactions containing the product X that also contain product Y. It can be used as an estimator. In this particular case, people often buy cappuccino/coffee together with a pastry. When looking at the Confidence value, someone purchasing a pastry is more likely to also buy cappuccino/coffee instead of the other way around.

Because pastries are identified as an estimator, the client wants me to continue analyzing its relationship with cappuccinos and coffees. I'm currently taking a closer look at this group of products. The cash register system comes with its own API, which gives more information about the specific type of pastry sold (scones, apple pie, etc.). Results will be added to this page when done.