Identifying Preferences in the Clientele of a Coffeehouse
Analyses of sales data from a local business to identify patterns in items sold.
Project Overview
The owner of Cocoon Coffee & More has asked me to analyze his sales since he started the company in 2019. The current cash register system provides rudimentary insights in sales, but this is not always sufficient. Certain questions remain unanswered and an analysis with Python can change that. This project is also meant to gain familiarity with (big) data practices for a smaller company. It was my task to convey the advantages of this approach and, above all, answer the outstanding questions.
- What are the bestselling products per category, per quarter and per year?
- How well did the different cocktails sell in 2021?
- Which products are often bought together?
The Challenges
Some data wrangling was needed before analyses could be performed. The notation of bought products on the receipts, in raw form, was not suitable for counting and analysis right away. These had to be adjusted so Python could make sense of the different items and the amounts. Secondly, not all products had a clear category. About 15.7% of products had no category at all which makes subsequent analyses a little less accurate. This was handled by giving these products a label, indicating no group was assigned. Improving annotation of products was one of the recommendations given to the client.
A final challenge was presenting results in way that is easy to understand for the client. Results and technical terms must be conveyed in a way that is easy to read but doesn't leave valuable information out. I documented the outcomes of the analyses in a report and presented this to the client. Results are accompanied by an explanation of the important metrics so he has actionable insights about his business.
The Approach & Solution
The analyses themselves were performed with Python, and every research question is handled by a different script. This project is an introduction to data analyses at this point. More permanent solutions can be discussed later. Reading, adjusting and visualizing the data are all done by Python and its packages. Visualizing the data is done with Seaborn library1, which is an add-on for Python. For visualizing the quarterly sales, I choose a faceted grid2 with bar charts to show multiple categories at once while maintaining readability. I used the same approach for the cocktail sales, but used a time series instead.
The Market Basket Analysis was done with the MLxtend library3, utilizing the apriori algorithm4 in combination with Python. This analyzes all receipts and returns association rules. These indicate which items are frequently bought together. It's also possible to deduce a likelihood of these products being bought in the same transaction. Observations like these can help with identifying keystone products or with product placement for example. The most prominent association rules are as follows.
Association Rule | Support | Confidence | Lift |
---|---|---|---|
Cappuccino ⇒ Pastry | 0.088 | 0.240 | 1.271 |
Pastry ⇒ Cappuccino | 0.088 | 0.467 | 1.271 |
Coffee ⇒ Pastry | 0.056 | 0.238 | 1.259 |
Pastry ⇒ Coffee | 0.056 | 0.296 | 1.259 |
Coffee ⇒ Cappuccino | 0.087 | 0.372 | 1.014 |
Cappuccino ⇒ Coffee | 0.087 | 0.238 | 1.014 |
The Results
When the analyses were complete, the results were presented to the client with a printed report and a presentation on site. Large graphs were also delivered as files. Knowing which products sell well helps the client with decision making for the business. This gives insight in bestsellers and sales of seasonal specials like cocktails. The Market Basket Analysis indicates that sales of pastries and cappuccino and coffee are associated with each other. Lift values higher than 1 implies a link between the products on either side of the arrow. The Support value indicates the percentage of transactions that included the combination. However, for each combination the Confidence score differs between mirrored rules. This is the proportion of transactions containing the product X that also contain product Y. It can be used as an estimator. In this particular case, people often buy cappuccino/coffee together with a pastry. When looking at the Confidence value, someone purchasing a pastry is more likely to also buy cappuccino/coffee instead of the other way around.
Because pastries are identified as an estimator, the client wants me to continue analyzing its relationship with cappuccinos and coffees. I'm currently taking a closer look at this group of products. The cash register system comes with its own API, which gives more information about the specific type of pastry sold (scones, apple pie, etc.). Results will be added to this page when done.