Lait Equitable: A Data-Driven Approach to Fair Trade milk

Author

Jayesh Smith and Emeline Raimon-Dacunha-Castelle Urs Hurni

Published

May 28, 2024

Abstract

The following project focuses on the analysis of the Lait Equitable dataset, which contains information on the production of fair trade milk in Switzerland. The goal of this project is to analyze the dataset and identify trends and patterns in the data that can help us better understand the production of fair trade milk in Switzerland. We will use a variety of data analysis techniques, including exploratory data analysis, data visualization, and statistical modeling, to analyze the dataset and draw conclusions about the production of fair trade milk in Switzerland.

1 Introduction

1.1 Company Background

Lait Equitable, also known as Fair Milk, is a transformative Swiss initiative designed to ensure fair compensation for dairy producers across Switzerland. Founded in response to a prolonged crisis in the dairy industry, Lait Equitable aims to address the substantial disparity between the cost of milk production and the compensation received by producers. Historically, the plummeting prices of milk have often failed to cover production costs, leading to financial stress among producers. The cooperative structure of Lait Equitable ensures that each liter of milk is sold for 1 CHF, aligning closely with the actual production cost of approximately 98 cents per liter as estimated by Agridea in 2016.

Key Points:

Foundation: Lait Equitable was established to counteract the severe financial difficulties faced by Swiss milk producers due to inadequate compensation for their products. This initiative emerged as a critical response to the drastic reduction in the number of operational dairy farms, which saw a decline from about 44,000 producers 25 years ago to 17,164 by the end of 2023.
Operation: The initiative operates through the “Lait Equitable” cooperative. This entity is responsible for collecting, processing, and distributing milk, ensuring that producers receive fair compensation. It is noteworthy that while the cooperative guarantees 1 CHF per liter, the actual market price often falls significantly below this, making the cooperative’s top-up essential for fair compensation.
Product Range: Under the brand Faireswiss, Lait Equitable offers a range of dairy products available at various retail outlets. While the milk in Faireswiss products may not always directly come from the cooperative members due to logistical reasons, the fair compensation model remains intact.
Membership: The cooperative is inclusive, welcoming all Swiss milk producers, including those involved in specialty products like Gruyère cheese. While the cooperative initially included cheese producers, decisions made in recent general assemblies have temporarily restricted this inclusion, subject to future reassessment.

This initiative not only supports the livelihood of local farmers but also forms part of a broader European movement towards fair milk pricing, coordinated by the European Milk Board (EMB). Lait Equitable serves as a beacon of hope and a model for sustainable agricultural practices in the dairy industry, ensuring producers can live off their work dignifiedly.

Source - faireswiss.ch

1.2 Research questions

Price Differences: To analyze the price differences between conventional and organic milk and their impact on market demand and supply.
Demand for Organic Milk: To study how the demand for organic milk has evolved in recent years and identify the driving factors behind this trend.
Sales Analysis Across Manors: To determine the factors contributing to the success of Lait Equitable’s milk in some Manor stores but not in others.

2 Data

2.1 Market for dairy products in Switzerland

The dairy market is an important sector of Swiss agriculture and the agri-food industry. The dairy industry accounts for around 20% of agricultural production.

Our data source is the Swiss Federal Office for Agriculture, OFAG. The dataset shows price trends on the dairy market at production level. The farmgate milk price is calculated on the basis of a monthly survey of the main milk buyers. These are the companies that buy milk directly from producers. International milk price trends are also shown for comparison purposes, we will look at this later.

Source - agrimarche

2.1.1 Wrangling and Cleaning

We start by removing the columns with no value or unique value, and the variables that are not relevant for our analysis. After proceeding, we end up with a sub-data set of 6 variables.

Click to show code

# Importer les données
data <- read.csv('../data/Données_marché_Lait.csv')

# Faire une copie des données
clean_data <- data

# Supprimer les colonnes avec des valeurs uniques
columns_to_drop <- c('Devise', 'Composants.de.coûts', 'Devise', 'Type.de.données', 'Source.de.données', 
                     'Groupe.de.produits', 'Commerce.extérieur', 'Indicateur', 'Marché', 
                     'Propriétés.du.produit', "Mode.d.utilisation", 'Unité', 'Produit', 
                     'Echelon.de.création.de.valeur', 'Echelon.de.création.de.valeur_Détail')

clean_data <- clean_data %>%
  select(-all_of(columns_to_drop))

clean_data$Date <- as.Date(paste0("01-", clean_data$Date), format="%d-%m-%Y")

# Renommer les colonnes
clean_data <- clean_data %>%
  rename(
    `System of production` = `Système.de.production`,
    `Product origin` = `Provenance.du.produit`,
    `Product subgroup` = `Sous.groupe.de.produits`,
    `Sales region` = `Région.de.vente`,
    `Price` = `Prix`
  )

# Changer les valeurs catégorielles dans les colonnes spécifiées
clean_data <- clean_data %>%
  mutate(`System of production` = recode(`System of production`, 'Conventionnel' = 'Conventional'),
         `Product origin` = recode(`Product origin`,
                                   'Région 1' = 'Region 1',
                                   'Région 2' = 'Region 2',
                                   'Région 3' = 'Region 3',
                                   'Région 4' = 'Region 4',
                                   'Région 5' = 'Region 5',
                                   'Suisse' = 'Switzerland',
                                   'Reste du monde, non suisse' = 'Rest of the world, non-Swiss',
                                   'Nouvelle-Zélande' = 'New Zealand',
                                   'Italie' = 'Italy',
                                   'UE' = 'EU',
                                   'Allemagne' = 'Germany',
                                   'Autriche' = 'Austria'),
         `Product subgroup` = recode(`Product subgroup`,
                                     'Lait cru CH' = 'CH Milk ',
                                     'Lait cru, International' = 'International Milk'),  # Ajustez selon vos données
         `Sales region` = recode(`Sales region`,
                                 'Suisse' = 'Switzerland',
                                 'Nouvelle-Zélande' = 'New Zealand',
                                 'Italie' = 'Italy',
                                 'UE-28' = 'EU-28',
                                 'Allemagne' = 'Germany',
                                 'Autriche' = 'Austria'))  # Ajustez selon vos données

# Définir le chemin exact où enregistrer le fichier
file_path <- '../data/clean_data.csv'

# Enregistrer le DataFrame dans un fichier CSV à l'emplacement spécifié
write.csv(clean_data, file = file_path, row.names = FALSE)

Click to show code

library(reactable)
library(readr)

# Lire le fichier CSV
clean_data <- read_csv("../data/clean_data.csv", show_col_types = FALSE)

# Afficher les données en utilisant reactable
reactable(clean_data)

2.1.2 Description

The data covers the period from January 2001 to January 2024. It provides information about the system of production used : conventional, bio, or IP Swiss. The origin and the sales region of the milk include Switzerland and its regions, as well as other international countries. This will enable us to carry out comparative analyses of Swiss and international milk prices. The ‘Price’ column represents the weighted average prices in centimes per quantity, mainly at the farm.

Click to show code

# Créez une table tibble avec des descriptions des variables
variable_table <- tibble(
  Variable = c("Date", "System of production", "Product origin", "Product subgroup", "Sales region", "Price"),
  Description = c(
    "The date when the data was recorded, in a year-month-day format.",
    "The system of production for the product.",
    "The origin of the product.",
    "The subgroup to which the product belongs.",
    "The region where the product is sold.",
    "The price in centimes of the product."
  )
)

# Affichez la table avec kableExtra
variable_table %>%
  kbl() %>%
  kable_styling(position = "center", bootstrap_options = c("striped", "bordered", "hover", "condensed"))

Variable	Description
Date	The date when the data was recorded, in a year-month-day format.
System of production	The system of production for the product.
Product origin	The origin of the product.
Product subgroup	The subgroup to which the product belongs.
Sales region	The region where the product is sold.
Price	The price in centimes of the product.

2.1.3 Missing Values

Then, we calculate the percentage of missing values for each characteristics and filter out those without.

Click to show code

# Calculer le pourcentage de NaN pour chaque colonne
nan_percent <- clean_data %>%
  summarise(across(everything(), ~mean(is.na(.)) * 100)) %>%
  pivot_longer(everything(), names_to = "Colonne", values_to = "Pourcentage_de_NaN")

# Filtrer pour garder seulement les colonnes avec NaN
nan_percent <- nan_percent %>%
  filter(Pourcentage_de_NaN > 0)

The columns with missing values on our sub data set are: Sales region and System of production. The ‘System of production’ has a higher percentage of missing values, around 37.6%. However, this is a key variable in our analysis so we keep it and replace missing information ‘NaN’ with ‘Unknown’ for clarity. As for Sales region, the percentage of missing values is relatively low thus we keep it.

2.2 Swiss dairy products market

2.2.1 Wrangling & Cleaning

The main objective of this project is to focus on milk producers in Switzerland and more specifically on their remuneration. To do this, we are going to create a sub data set ‘swiss_production_data’ that will only keep the Swiss regions in ‘Product origin’.

Click to show code

# Sélectionner les colonnes spécifiées
selected_columns <- c('Date', 'System of production', 'Product origin', 'Price')
subset_data <- clean_data %>%
  select(all_of(selected_columns))

# Définir les régions à conserver
regions_to_keep <- c('Region 1', 'Region 2', 'Region 3', 'Region 4', 'Region 5')
swiss_production_data <- subset_data %>%
  filter(`Product origin` %in% regions_to_keep)

# Remplacer les valeurs manquantes par 'Unknown'
swiss_production_data <- swiss_production_data %>%
  mutate(across(everything(), ~replace_na(., 'Unknown')))

# Définir le chemin exact où enregistrer le fichier
file_path <- '../data/swiss_production_data.csv'

# Enregistrer le DataFrame dans un fichier CSV à l'emplacement spécifié
write.csv(swiss_production_data, file = file_path, row.names = FALSE)

Click to show code

# Lire le fichier CSV
swiss_production_data <- read_csv("../data/swiss_production_data.csv", show_col_types = FALSE)

# Afficher les données en utilisant reactable
reactable(swiss_production_data)

2.2.2 Description

Here is the geographical distribution of the 5 regions in Switzerland.

Click to show code

# Créez une table tibble avec des descriptions des variables
table <- tibble(
  Variable = c("Region 1", "Region 2", "Region 3", "Region 4", "Region 5"),
  Description = c(
    "Geneva, Vaud, Fribourg, Neuchâtel, Jura and the French-speaking parts of the canton of Bern (administrative district of Bernese Jura).",      "Bern, except the administrative district of Jura Bernois, Lucerne, Unterwalden : Obwalden. Nidwalden, Uri, Zug and part of the canton of      Schwyz (district of Schwyz, Gersau and Küssnacht).",
    "Basel-Landschaft and Basel-Stadt, Aargau and Solothurn",
    "Zurich, Schaffhausen, Thurgau. Appenzell (Inner and Outer Rhodes), St. Gallen, part of Canton Schwyz (districts of Einsiedeln, March and       Höfe), Glarus, Graubünden",
    "Valais and Ticino"
  )
)

# Affichez la table avec kableExtra
table %>%
  kbl() %>%
  kable_styling(position = "center", bootstrap_options = c("striped", "bordered", "hover", "condensed"))

Variable	Description
Region 1	Geneva, Vaud, Fribourg, Neuchâtel, Jura and the French-speaking parts of the canton of Bern (administrative district of Bernese Jura).
Region 2	Bern, except the administrative district of Jura Bernois, Lucerne, Unterwalden : Obwalden. Nidwalden, Uri, Zug and part of the canton of Schwyz (district of Schwyz, Gersau and Küssnacht).
Region 3	Basel-Landschaft and Basel-Stadt, Aargau and Solothurn
Region 4	Zurich, Schaffhausen, Thurgau. Appenzell (Inner and Outer Rhodes), St. Gallen, part of Canton Schwyz (districts of Einsiedeln, March and Höfe), Glarus, Graubünden
Region 5	Valais and Ticino

2.3 Lait Equitable Sales

2.3.1 Dataset Sales 2023

2.3.1.1 Wrangling and Cleaning

Click to show code

file_path = '../data/all_products_sales_per_stores_2023.xlsx'
df = pd.read_excel(file_path, sheet_name='Sheet1')

df.columns = df.columns.astype(str)
# Renaming the monthly columns for easier readability
new_column_names = {
    '2023-01-01 00:00:00': 'Jan 2023',
    '2023-02-01 00:00:00': 'Feb 2023',
    '2023-03-01 00:00:00': 'Mar 2023',
    '2023-04-01 00:00:00': 'Apr 2023',
    '2023-05-01 00:00:00': 'May 2023',
    '2023-06-01 00:00:00': 'Jun 2023',
    '2023-07-01 00:00:00': 'Jul 2023',
    '2023-08-01 00:00:00': 'Aug 2023',
    '2023-09-01 00:00:00': 'Sep 2023',
    '2023-10-01 00:00:00': 'Oct 2023',
    '2023-11-01 00:00:00': 'Nov 2023',
    '2023-12-01 00:00:00': 'Dec 2023'
}
df.rename(columns=new_column_names, inplace=True)

# Standardize city names based on the mapping provided
correct_city_names = {
    'Bâle': 'Basel',
    'Genève': 'Geneva',
    'Bienne': 'Biel/Bienne',
    'Chavannes': 'Chavannes-de-Bogis',
    'Marin': 'Marin-Epagnier',
    'Vesenaz': 'Vésenaz',
    'Yverdon': 'Yverdon-les-Bains',
    'Saint-Gall Webersbleiche': 'St. Gall'
}
df['Row Labels'] = df['Row Labels'].apply(lambda x: correct_city_names.get(x, x))

The dataset from 2023 was meticulously cleaned and standardized to ensure accuracy in our analysis. Initial steps included loading the data from an Excel file and renaming columns to reflect clearer, month-specific sales data for easier readability. Additionally, we corrected city names to maintain consistency across datasets. This included mapping various forms of city names to their standardized counterparts (e.g., ‘Bâle’ to ‘Basel’).

2.3.1.2 Description

The dataset is very light and contains monthly sales data for the year 2023. It is essential however for the analysis of the sales of Lait Equitable in different Manor stores.

Here is a preview of the cleaned and structured data:

Click to show code

#load python df
df_sales_2023 <- py$df

# Load necessary libraries
library(tibble)
library(kableExtra)

# Create a tibble with variable descriptions for df_manor_sales
variable_table <- tibble(
  Variable = c("Row Labels", "Monthly Columns (2023-01-01 to 2023-12-01)", "Grand Total"),
  Description = c(
    "Identifies the Manor store location by name.",
    "Each column represents sales figures for a specific month of 2023",
    "Total sales across all months of 2023 for each location"
  )
)

# Display the table using kableExtra
variable_table %>%
  kbl() %>%
  kable_styling(position = "center", bootstrap_options = c("striped", "bordered", "hover", "condensed"))

Variable	Description
Row Labels	Identifies the Manor store location by name.
Monthly Columns (2023-01-01 to 2023-12-01)	Each column represents sales figures for a specific month of 2023
Grand Total	Total sales across all months of 2023 for each location

Click to show code


# Using the provided column names correctly in the dataframe df_sales_2023
df_sales_2023_show <- df_sales_2023 %>%
  # Ensure you convert the column names to standard ones if needed
  rename(Location = `Row Labels`) %>%
  # Correctly sum the monthly sales columns from Jan 2023 to Dec 2023
  mutate(Total_Sales = rowSums(select(., `Jan 2023`:`Dec 2023`), na.rm = TRUE)) %>%
  select(Location, `Jan 2023`:`Dec 2023`, Total_Sales) %>%
  mutate_if(is.numeric, round, 2)  # round all numeric columns to 2 decimal places

# Display the data using reactable for an interactive table
reactable(
  df_sales_2023_show,  
  highlight = TRUE,  # Highlight rows on hover
  defaultPageSize = 10,  # Display 10 rows per page
  paginationType = "numbers",  # Use numbers for page navigation
  searchable = TRUE,  # Make the table searchable
  sortable = TRUE,  # Allow sorting
  resizable = TRUE  # Allow column resizing
)

2.3.2 Dataset Sales 2022

2.3.2.1 Wrangling and Cleaning

Following the methodology established with the 2023 dataset, the 2022 sales data was similarly processed. The data from 2022, while structurally different, was also standardized to facilitate comparison and analysis. This included renaming columns to ensure uniformity in location names across both datasets.

Click to show code

# Load the data for 2022
file_path_2022 = '../data/sales_2022.xlsx'
df_2022 = pd.read_excel(file_path_2022)

# Standardize city names based on the provided mapping
city_name_mapping = {
    'Bâle': 'Basel',
    'Genève': 'Geneva',
    'Bienne': 'Biel/Bienne',
    'Chavannes': 'Chavannes-de-Bogis',
    'Marin': 'Marin-Epagnier',
    'Vesenaz': 'Vésenaz',
    'Yverdon': 'Yverdon-les-Bains',
    'Saint-Gall Webersbleiche': 'St. Gall'
}

# Rename columns to standardize city names
df_2022.rename(columns=city_name_mapping, inplace=True)

# Pivoting the table to get total sales per location for 2022, summing across all products
sales_columns_2022 = [col for col in df_2022.columns if col not in ['Code article', 'Description article', 'Marque', 'Code Fournisseur', 'Description Fournisseur']]
df_2022_total_sales = df_2022[sales_columns_2022].sum().reset_index()
df_2022_total_sales.columns = ['Location', 'Total Sales 2022']

2.3.2.2 Description

The 2022 dataset, unlike the 2023 dataset, includes a variety of products, each recorded with sales figures across different locations. This dataset is notably less complex, focusing on total sales rather than monthly breakdowns, yet provides critical insights into the sales performance of different products.

Click to show code

# Load the 2022 sales data
df_sales_2022 <- py$df_2022
# Load necessary libraries
library(tibble)
library(kableExtra)

# Create a tibble with variable descriptions for df_sales
variable_table <- tibble(
  Variable = c("Code article", "Description article", "Marque", "Code Fournisseur", "Description Fournisseur",
               "Location Columns (e.g., Ascona-Delta, Baden, Bâle, etc.)"),
  Description = c(
    "Unique identifier for each product.",
    "Descriptive name of the product.",
    "Brand of the product.",
    "Supplier code.",
    "Supplier name.",
    "Each of these columns represents sales figures for that specific location."
  )
)

# Display the table using kableExtra
variable_table %>%
  kbl() %>%
  kable_styling(position = "center", bootstrap_options = c("striped", "bordered", "hover", "condensed"))

Variable	Description
Code article	Unique identifier for each product.
Description article	Descriptive name of the product.
Marque	Brand of the product.
Code Fournisseur	Supplier code.
Description Fournisseur	Supplier name.
Location Columns (e.g., Ascona-Delta, Baden, Bâle, etc.)	Each of these columns represents sales figures for that specific location.

Here is a closer look at the structured 2022 sales data:

Click to show code

library(dplyr)
library(reactable)

# Assuming the dataframe is correctly named df_sales_2022 and is already loaded
# Ensure the dataframe is available in your R environment
print(head(df_sales_2022))
#>   Code article                      Description article
#> 1        1e+08 CRÈME CAFÉ UHT15%MG LAIT FAIRSWISS10X12G
#> 2        1e+08        LAIT ÉQUITABLE ENTIER 3.5% UHT 1L
#> 3        1e+08         LAIT ÉQUITABLE DRINK 1.5% UHT 1L
#> 4        1e+08        LAIT ÉQUITABLE DRINK 1.5% PAST 1L
#> 5        1e+08       LAIT ÉQUITABLE ENTIER 3.5% PAST 1L
#>                Marque Code Fournisseur Description Fournisseur
#> 1 FAIRSWISS FAIRSWISS          2791100                CREMO SA
#> 2          Cremo (NA)          2791100                CREMO SA
#> 3             NA (NA)          2791100                CREMO SA
#> 4          Cremo (NA)          2791100                CREMO SA
#> 5          Cremo (NA)          2791100                CREMO SA
#>   Ascona-Delta Baden Basel Balerna Biel/Bienne Chavannes-de-Bogis
#> 1          661   224   333     148         280                689
#> 2          701   215   201     416        3025               7532
#> 3          498   134   313     183        3238               5547
#> 4          107   127    45      32          19                297
#> 5           97   130   110      48          49                294
#>   Chur Delémont Emmen Fribourg Geneva Lausanne Lugano Marin-Epagnier
#> 1  236      242   553      679   1088      732    719            337
#> 2  101     2405   290     8025   6509     6235    399          16843
#> 3   86     1427   243     5946   8174     5888    491          13022
#> 4   11       22    13       61    161      175     54             11
#> 5   47       35     8       81    136      156    108             15
#>   Monthey Morges Nyon Rapperswil St. Gall San Antonino Sargans
#> 1     636    453  221        172      204          107     220
#> 2   30013   4017 1895         83      261          415     351
#> 3   19175   3799 1653        112      327          338     215
#> 4      54     60   53          7       16           31      21
#> 5      87     95   41          3       40           44      77
#>   Sierre Sion Vésenaz Vevey Vezia Yverdon-les-Bains
#> 1    581  357     315   642   364               412
#> 2  17582 7809    2197 12646   487              5004
#> 3  13905 6901    1707 12172   407              5450
#> 4     52   82      59   129    65                29
#> 5     92   59      36   131   141                24

# Prepare the data by calculating the total sales per product across all locations
df_sales_2022_show <- df_sales_2022 %>%
  mutate(Total_Sales = rowSums(select(., `Ascona-Delta`:`Yverdon-les-Bains`), na.rm = TRUE)) %>%
  select(`Code article`, `Description article`, `Marque`, `Code Fournisseur`, `Description Fournisseur`, `Ascona-Delta`:`Yverdon-les-Bains`, Total_Sales) %>%
  mutate_if(is.numeric, round, 2)  # Round all numeric columns to 2 decimal places

# Display the data using reactable for an interactive and visually appealing table
reactable(
  df_sales_2022_show,
  highlight = TRUE,
  defaultPageSize = 5,
  paginationType = "numbers",
  searchable = TRUE,
  sortable = TRUE,
  resizable = TRUE
)

2.3.2.3 Merging 2022 and 2023 dataset

Click to show code

# Extracting the total sales for 2023 from the first dataset
df_2023_total_sales = df[['Row Labels', 'Grand Total']].rename(columns={'Row Labels': 'Location', 'Grand Total': 'Total Sales 2023'})

# Merging the 2022 and 2023 datasets on Location
merged_sales_data = pd.merge(df_2022_total_sales, df_2023_total_sales, on='Location', how='outer')

# Filling any NaN values that might have occurred due to locations present in one dataset and not the other
merged_sales_data.fillna(0, inplace=True)

Click to show code

# Load the merged sales data
df_merged_sales <- py$merged_sales_data
#show it using reactable
reactable(
  df_merged_sales,  
  highlight = TRUE,  # Highlight rows on hover
  defaultPageSize = 10,  # Display 10 rows per page
  paginationType = "numbers",  # Use numbers for page navigation
  searchable = TRUE,  # Make the table searchable
  sortable = TRUE,  # Allow sorting
  resizable = TRUE  # Allow column resizing
)

The 2022 sales data has been aggregated and standardized for each location. The merged dataset now shows the total sales per location for both 2022 and 2023. This dataset offers a comprehensive view of Lait Equitable’s sales dynamics over two consecutive years, highlighting trends and changes in consumer behavior across different locations.

2.4 Political Parties Dataset

2.4.1 Wrangling and Cleaning

The analysis starts by importing two datasets: sales data (annual sales of fair trade milk) and political party data (support percentages for major parties by location). The political data is cleaned to match commune names in the sales data and transformed into party presence percentages.

Next, the cleaned political data is merged with the sales data based on commune names. This merged dataset enables a combined analysis of political party presence and sales performance.

Click to show code

# Read sales data from Excel
sales_data <- read_excel("../data/Ventes annuelles.xlsx")

# Read political party data from Excel
party_data <- read_excel("../data/partisPolitiqueManor.xlsx")

# Clean up party_data to match sales_data locations
party_data_cleaned <- party_data %>%
  mutate(Location = gsub(" ", "", Location)) %>%
  filter(Location %in% sales_data$Location)

# Update party names to match the new column names
party_data_cleaned <- party_data_cleaned %>%
  mutate(PLR_Presence = PLR / (PLR + PS + UDC + Centre + Verts + Vertliberaux) * 100,
         PS_Presence = PS / (PLR + PS + UDC + Centre + Verts + Vertliberaux) * 100,
         UDC_Presence = UDC / (PLR + PS + UDC + Centre + Verts + Vertliberaux) * 100,
         Centre_Presence = Centre / (PLR + PS + UDC + Centre + Verts + Vertliberaux) * 100,
         Verts_Presence = Verts / (PLR + PS + UDC + Centre + Verts + Vertliberaux) * 100,
         Vertliberaux_Presence = Vertliberaux / (PLR + PS + UDC + Centre + Verts + Vertliberaux) * 100) %>%
  filter(PLR_Presence > 0.2 | PS_Presence > 0.2 | UDC_Presence > 0.2 | Centre_Presence > 0.2 | Verts_Presence > 0.2 | Vertliberaux_Presence > 0.2)

# Merge sales_data with updated party presence data
merged_data <- merge(sales_data, party_data_cleaned, by = "Location")

The analysis starts by importing one other dataset: revenue per capita per commune data. The revenue data is cleaned to match commune names in the sales data.

Next, the cleaned revenue data is merged with the sales data based on commune names. This merged dataset enables a combined analysis of revenue per capita per commune and sales performance.

Click to show code

# Load the datasets
revenu_df <- read_excel("../data/revenuParContribuable_CommuneManor.xlsx")

# Merge the datasets on the "Location" column
merged_df <- inner_join(revenu_df, sales_data, by = "Location")

# Clean the data and convert to numeric format
merged_df$`Revenu/contribuable` <- as.numeric(gsub(" ", "", merged_df$`Revenu/contribuable`))
merged_df$`2022` <- as.numeric(gsub(" ", "", merged_df$`2022`))
merged_df$`2023` <- as.numeric(gsub(" ", "", merged_df$`2023`))

2.4.2 Description

Click to show code

# Display the merged data using reactable
reactable(
  merged_data,  
  highlight = TRUE,  # Highlight rows on hover
  defaultPageSize = 10,  # Display 10 rows per page
  paginationType = "numbers",  # Use numbers for page navigation
  searchable = TRUE,  # Make the table searchable
  sortable = TRUE,  # Allow sorting
  resizable = TRUE  # Allow column resizing
)

3 Exploratory Data Analysis

3.1 General analysis of dairy products market

In order to gain a better understanding of Lait Equitable sales, we’re going to start with a global analysis of the dairy market to get an idea of how milk prices work in Switzerland. The objective is to move from a macro-economic analysis to a micro-economic analysis with Lait Equitable. The farm gate price of milk is influenced by a number of factors, including milk production costs, the type of farming and the market situation in Switzerland and abroad, with demand tending to outstrip supply. The farm gate milk price is a weighted average of the prices actually paid to producers. Since 2017, the annual average farm gate milk price has risen steadily and continues its growth.

3.1.1 Distribution of Swiss & International milk

There are 2 types of milk: international milk and CH milk in the column ‘Product subgroup’. It would be interesting to compare the percentages of these two types of milk in order to analyse whether the quantity of milk produced in Switzerland is significant or not.

Click to show code

# Compter les occurrences
occurrences <- clean_data %>%
  count(`Product subgroup`) %>%
  rename(`Product subgroup` = `Product subgroup`, `Nombre d'occurrences` = n)

# Créer l'histogramme interactif avec Plotly
fig <- plot_ly(
  data = occurrences, 
  x = ~`Product subgroup`, 
  y = ~`Nombre d'occurrences`, 
  type = 'bar', 
  text = ~`Nombre d'occurrences`,
  textposition = 'outside',
  hoverinfo = 'text',
  hovertext = ~paste('Frequency:', `Nombre d'occurrences`),
  marker = list(color = c('#87CEEB', '#90EE90'))  # Couleurs douces
) %>%
  layout(
    title = 'Occurrence of CH milk and International milk',
    xaxis = list(title = 'Product subgroup'),
    yaxis = list(title = 'Frequency'),
    width = 400, height = 400
  )
#> Warning: Specifying width/height in layout() is now deprecated.
#> Please specify in ggplotly() or plot_ly()

# Afficher le graphique
fig

The histogram shows a higher percentage of CH milk than International Milk. This suggests that the Swiss dairy market is thriving.

3.1.2 Milk price trends by country

We are going to show price trends by country and carry out a macroeconomic analysis. The column ‘Product origin’ has 10 different origins, which are shown below.

Click to show code

# Définir les valeurs à exclure
regions_to_exclude <- c('Region 1', 'Region 2', 'Region 3', 'Region 4', 'Region 5')

# Filtrer les données en excluant ces valeurs
filtered_data <- clean_data %>%
  filter(!`Product origin` %in% regions_to_exclude)

# Convertir la colonne 'Date' en format Date si nécessaire
filtered_data$Date <- ymd(filtered_data$Date)

# Calculer le prix moyen par 'Product origin' et 'Date'
average_price <- filtered_data %>%
  group_by(`Product origin`, Date) %>%
  summarise(Average_Price = mean(Price, na.rm = TRUE), .groups = 'drop') %>%
  ungroup()

# Créer un graphique interactif avec Plotly
fig <- plot_ly(data = average_price, x = ~Date, y = ~Average_Price, color = ~`Product origin`, type = 'scatter', mode = 'lines')

# Personnaliser la couleur pour la Suisse
fig <- fig %>%
  layout(
    title = 'Average milk price by country over time',
    xaxis = list(title = 'Year', tickformat = "%Y", dtick = "M12", tickangle = 45),
    yaxis = list(title = 'Average price'),
    legend = list(title = list(text = 'Origin'), x = 1.05, y = 1),
    width = 800, height = 400
  )
#> Warning: Specifying width/height in layout() is now deprecated.
#> Please specify in ggplotly() or plot_ly()

# Mettre en évidence la Suisse en rose
fig <- fig %>%
  add_trace(
    data = average_price %>% filter(`Product origin` == "Switzerland"),
    x = ~Date, y = ~Average_Price,
    type = 'scatter', mode = 'lines',
    line = list(color = 'pink'),
    name = 'Switzerland'
  )

# Afficher le graphique
fig
#> Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
#> Returning the palette you asked for with that many colors
#> Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
#> Returning the palette you asked for with that many colors

We can see that Switzerland is far superior to other countries in terms of price, over the entire period observed, from 2001 to 2024. Given that Switzerland is the most expensive of all the countries listed, this is not surprising. However, the difference in centimes remains significant, up to twice the price or more.

3.2 Analysis of the Swiss dairy products market

The main objective of this project is to focus on milk producers in Switzerland and more specifically on their remuneration. Now we’re going to focus on Switzerland and its milk production. To do this, we are going to use our sub data set ‘swiss_production_data’.

3.2.1 Distribution of milk cost for Switzerland

Click to show code

# Créer le graphique de densité
gg_density <- ggplot(swiss_production_data, aes(x = Price)) +
  geom_density(fill = "skyblue", alpha = 0.5) +
  labs(
    x = "Price in centimes",
    y = "Density",
    title = "Density Plot of Production Cost in Switzerland"
  ) +
  theme_minimal()

# Convertir le graphique ggplot en graphique interactif avec plotly
plotly_density <- ggplotly(gg_density) %>% 
  layout(width = 500, height = 400)
#> Warning: Specifying width/height in layout() is now deprecated.
#> Please specify in ggplotly() or plot_ly()

# Afficher le graphique interactif
plotly_density

This distribution, which covers all Swiss regions, gives an average price of around 75.25 centimes. The distribution is asymmetrical, with a more gradual rise to this value followed by an sharp fall.

Given that the sample size differs for each region, we represent the distribution of the cost of production for each region in terms of density to be able to compare them.

3.2.2 Distribution of milk cost by region

Click to show code

# Définir la nouvelle palette de couleurs
palette <- c("red", "green", "blue", "orange", "purple")

# Tri des régions
regions_sorted <- sort(unique(swiss_production_data$`Product origin`))

# Ajouter une colonne 'Color' pour assigner une couleur à chaque région
swiss_production_data <- swiss_production_data %>%
  mutate(Color = factor(`Product origin`, levels = regions_sorted, labels = palette))

# Créer le graphique de densité
gg_density <- ggplot(swiss_production_data, aes(x = Price, color = `Product origin`, fill = `Product origin`)) +
  geom_density(alpha = 0.5) +
  scale_fill_manual(values = palette) +
  scale_color_manual(values = palette) +
  labs(
    x = "Price in centimes",
    y = "Density",
    title = "Distribution of Price Density by Region",
    fill = "Sales regions",
    color = "Sales regions"
  ) +
  theme_minimal()

# Convertir le graphique ggplot en graphique interactif avec plotly
plotly_density <- ggplotly(gg_density, tooltip = c("x", "y", "color")) %>% 
  layout(width = 500, height = 400)
#> Warning: Specifying width/height in layout() is now deprecated.
#> Please specify in ggplotly() or plot_ly()

# Afficher le graphique interactif
plotly_density

The region 1 is the most expensive region in comparison to the others and the region 3 shows the lowest price in Switzerland. (see table of region above for details)

3.2.3 Distribution of milk production by region

We will now analyse the distribution of kg of milk produced across the different regions. This will give us an overview of the regions with the highest production rates and therefore the most competitive regions in terms of quantity produced.

Click to show code

# Compter les occurrences de chaque type de produit
product_counts <- as.data.frame(table(swiss_production_data$`Product origin`))
names(product_counts) <- c("Product origin", "Count")

# Définir l'ordre désiré des régions
order <- c('Region 1', 'Region 2', 'Region 3', 'Region 4', 'Region 5')

# Réindexer les données dans l'ordre désiré
product_counts$`Product origin` <- factor(product_counts$`Product origin`, levels = order)

# Ajuster les couleurs pour les rendre plus claires
adjusted_colors <- c(
  'Region 1' = adjustcolor('red', alpha.f = 0.6),
  'Region 2' = adjustcolor('green', alpha.f = 0.6),
  'Region 3' = adjustcolor('blue', alpha.f = 0.6),
  'Region 4' = adjustcolor('orange', alpha.f = 0.6),
  'Region 5' = adjustcolor('purple', alpha.f = 0.6)
)

# Créer un bar chart avec ggplot2
p <- ggplot(product_counts, aes(x = `Product origin`, y = Count, fill = `Product origin`, text = paste("Product origin:", `Product origin`, "<br>Count:", Count))) +
  geom_bar(stat = "identity") +
  scale_fill_manual(values = adjusted_colors) +
  labs(title = 'Distribution of milk production in Switzerland', x = 'Product origin', y = 'Number of Occurrences') +
  theme_minimal() +
  theme(legend.position = "none") +
  ylim(0, max(product_counts$Count) * 1.2)

# Rendre le graphique interactif avec plotly
fig <- ggplotly(p, tooltip = c("text")) %>% 
  layout(width = 600, height = 400)
#> Warning: Specifying width/height in layout() is now deprecated.
#> Please specify in ggplotly() or plot_ly()

# Afficher le graphique
fig

Milk production is evenly distributed between regions 1, 2 and 4. However, regions 3 and 5 show lower production, particularly region 3. This distribution of production can be explained by the location of the regions. (see table of region above for details)

3.2.4 Average milk price per system of production over time

For the multivariate analysis, we’re going to focus on the production system and analyse the impact of the production system on price over time.

Click to show code

# Convertir la colonne 'Date' en datetime
swiss_production_data$Date <- as.Date(swiss_production_data$Date, format="%Y-%m-%d")

# Calculer le prix moyen par système de production et par date
average_price <- swiss_production_data %>%
  group_by(`System of production`, Date) %>%
  summarise(Price = mean(Price, na.rm = TRUE), .groups = 'drop')

# Définir les couleurs pour chaque système de production
colors <- c("Bio" = "green", "Conventional" = "orange", "Unknown" = "blue")

# Créer une liste de traces pour chaque système de production
traces <- list()
systems <- unique(average_price$`System of production`)
for (system in systems) {
  subset <- filter(average_price, `System of production` == system)
  trace <- list(
    x = subset$Date,
    y = subset$Price,
    mode = 'lines',
    name = system,
    type = 'scatter',
    line = list(color = colors[system])
  )
  traces <- append(traces, list(trace))
}

# Créer la figure
fig <- plot_ly(width = 1000)
for (trace in traces) {
  fig <- add_trace(fig, x = trace$x, y = trace$y, mode = trace$mode, name = trace$name, type = trace$type, line = trace$line)
}

# Définir les années uniques pour l'axe x
years <- unique(format(average_price$Date, "%Y"))

# Ajouter des titres et des légendes, et ajuster les paramètres de l'axe x et y
fig <- layout(fig,
              title = 'Average production price by system of production over time',
              xaxis = list(
                title = 'Date',
                tickmode = 'array',
                tickvals = as.Date(paste0(years, "-01-01")),  # Afficher toutes les années
                ticktext = years,
                tickangle = -45,
                tickfont = list(size = 10)
              ),
              yaxis = list(
                title = 'Average Price (in cents)',
                range = c(48, 105)  # Définir les limites de l'axe y
              ),
              legend = list(title = list(text = 'System.of.Production')),
              margin = list(b = 100),
              width = 800, height = 400# Augmenter la marge inférieure pour les étiquettes pivotées
)
#> Warning: Specifying width/height in layout() is now deprecated.
#> Please specify in ggplotly() or plot_ly()

# Afficher le graphique
fig

An organic production system means higher prices as shown on the graph. The average price per kg of milk produced was higher in early 2001, around 83.5 cents for conventional milk production and almost 1.00 CHF for organic milk. The highest price recorded until now. The ‘unknown’ data are highly correlated with the ‘conventional’ data, the two curves are overlapping. As the ‘Unknown’ values do not provide any additional information, we will only consider the ‘Conventional’ mode for the rest of our analysis. There is a fall in prices until 2007, followed by a peak in 2008, which can be explained by the 2008 financial crisis. We note a seasonality and a trend patterns. The non-linear trend is upward since 2017. Does this trend will continue to go upward ? What are the price predictions ? We will answer this question in the Analysis section.

3.3 Lait Equitable Products

To provide a comprehensive understanding of Lait Equitable’s sales trends throughout 2023, we performed a month-by-month sales analysis. This exploration helps identify seasonal effects, peak sales periods, and potential areas for strategic adjustments. Here’s a detailed breakdown of the approach and findings:

3.3.1 Sales Distribution Accross Months

Click to show code

# Remove 'Grand Total' column and the row labels column
monthly_sales <- df_sales_2023 %>% 
  select(-c(`Grand Total`, `Row Labels`))

# Aggregate the sales per month across all locations
total_sales_per_month <- colSums(monthly_sales)

# Create a data frame for plotting
monthly_sales_df <- data.frame(Month = names(total_sales_per_month), Sales = total_sales_per_month)

# Sort the data frame by Sales in descending order
sorted_monthly_sales_df <- monthly_sales_df %>%
  arrange(desc(Sales))

# Plotting with ggplot2 using viridis color palette
ggplot(sorted_monthly_sales_df, aes(x=reorder(Month, -Sales), y=Sales, fill=Month)) +
  geom_bar(stat="identity", show.legend = FALSE, fill = "#24918d", color = "black") + 
  labs(title = "Total Sales by Month Across All Locations (2023)", x = "Month", y = "Total Sales") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

The graph shows total monthly sales across all locations for the “Lait Equitable” in 2023. It shows the month with the highest sales, March and continuing to lower sales months. The least profitable month appears to be July.

Highest Sales in March: The graph starts with March, which shows the highest sales, almost reaching 25,000 units. This suggests that March was a particularly strong month for sales, possibly due to seasonal factors or specific marketing campaigns.
Gradual Decline in Sales: As we move from left to right, there is a general trend of declining sales. After March, the next highest sales are in December, followed by April, May, and so on. This indicates that sales in March were not sustained throughout the year.
Mid-year and End-Year Trends: While the graph is not in chronological order, it shows that some months like December (typically strong due to the holiday season) also performed well, but none reached the peak seen in March.
Lower Sales in the Latter Months Displayed: The months at the right end of the graph, such as June and July, show the lowest sales figures in the year. This could indicate a seasonal dip or other market dynamics affecting these months. One supposition could be that people are on vacations at these dates due to school vacations.

3.3.2 Sales Distribution Accross Locations

Click to show code

# First, we need to remove the 'Grand Total' column if it's included
df <- df_sales_2023[, -ncol(df_sales_2023)]
# Sum sales across all months for each location
total_sales_by_location <- df %>%
  mutate(Total_Sales = rowSums(select(., -`Row Labels`))) %>%
  select(`Row Labels`, Total_Sales)

# Sort the locations by total sales in descending order
sorted_sales_by_location <- total_sales_by_location %>%
  arrange(desc(Total_Sales))

# Plotting the data with ggplot2
ggplot(sorted_sales_by_location, aes(x=reorder(`Row Labels`, Total_Sales), y=Total_Sales, fill=`Row Labels`)) +
  geom_bar(stat="identity", show.legend = FALSE,  fill = "#24918d", color = "black") + 
  labs(title = "Total Sales by Location (2023)", x = "Location", y = "Total Sales") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust=0.5)) # Rotate the x-axis text for better readability

The graph illustrates the total sales by location for Lait Equitable across various stores in 2023, organized from the lowest to the highest sales volume.

Variability in Sales Across Locations: The graph displays a significant variation in sales across different locations. The left side of the graph shows locations with the least sales, starting with Chur, Rapperswil, St. Gall, and progressively increasing towards the right.
Low Sales in Certain Areas: Locations like Chur, Rapperswil, and St. Gall have notably low sales, which could indicate either a lower demand for Lait Equitable’s products in these areas or possibly less effective marketing and distribution strategies.
High Sales in Specific Locations: The right end of the graph, particularly the last five locations, shows a sharp increase in sales. Notably, Vevey, Marin-Epagnier, Sierre and Monthey exhibit high sales, with Monthey being the highest. This might indicate a stronger market presence, better consumer acceptance, or more effective promotional activities in these regions.
Potential Market Strengths and Weaknesses: The graph effectively highlights where Lait Equitable is performing well and where there might be room for improvement. For instance, the high sales in cities like Sierre and Monthey suggest strong market penetration and acceptance.
Strategic Insights: For the Lait Equitable, this graph provides crucial data points for understanding which locations might need more focused marketing efforts or adjustments in distribution strategies. Additionally, it could help in identifying successful strategies in high-performing locations that could be replicated in areas with lower sales.

Click to show code

#remove grand total
df <- df_sales_2023[, -ncol(df_sales_2023)]
# Transform the data into a long format where each row contains a location, a month, and sales
long_data <- df %>%
  pivot_longer(cols = -`Row Labels`, names_to = "Month", values_to = "Sales") %>%
  mutate(Location = `Row Labels`)

# Create a plotly object for an interactive boxplot
fig <- plot_ly(long_data, x = ~Location, y = ~Sales, type = 'box',
               hoverinfo = 'text', text = ~paste('Month:', Month, '<br>Sales:', Sales),
               marker = list(color = "#7e57c2",
                             boxpoints = "all",
                             jitter = 0.3),
               box = list(line = list(color = "#24918d"))
               ) %>% 
  layout(title = "Distribution of Monthly Sales Across Locations",
         xaxis = list(title = "Location"),
         yaxis = list(title = "Monthly Sales"),
         showlegend = FALSE, 
         width= 600,
         height = 800) %>% 
  config(displayModeBar = FALSE) # Optional: hide the mode bar
#> Warning: Specifying width/height in layout() is now deprecated.
#> Please specify in ggplotly() or plot_ly()


# Display the plot
fig
#> Warning: 'box' objects don't have these attributes: 'box'
#> Valid attributes include:
#> 'alignmentgroup', 'boxmean', 'boxpoints', 'customdata', 'customdatasrc', 'dx', 'dy', 'fillcolor', 'hoverinfo', 'hoverinfosrc', 'hoverlabel', 'hoveron', 'hovertemplate', 'hovertemplatesrc', 'hovertext', 'hovertextsrc', 'ids', 'idssrc', 'jitter', 'legendgroup', 'legendgrouptitle', 'legendrank', 'line', 'lowerfence', 'lowerfencesrc', 'marker', 'mean', 'meansrc', 'median', 'mediansrc', 'meta', 'metasrc', 'name', 'notched', 'notchspan', 'notchspansrc', 'notchwidth', 'offsetgroup', 'opacity', 'orientation', 'pointpos', 'q1', 'q1src', 'q3', 'q3src', 'quartilemethod', 'sd', 'sdsrc', 'selected', 'selectedpoints', 'showlegend', 'stream', 'text', 'textsrc', 'transforms', 'type', 'uid', 'uirevision', 'unselected', 'upperfence', 'upperfencesrc', 'visible', 'whiskerwidth', 'width', 'x', 'x0', 'xaxis', 'xcalendar', 'xhoverformat', 'xperiod', 'xperiod0', 'xperiodalignment', 'xsrc', 'y', 'y0', 'yaxis', 'ycalendar', 'yhoverformat', 'yperiod', 'yperiod0', 'yperiodalignment', 'ysrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule', '_bbox'

This graphs show the variability in sales across different locations for each month in 2023. The boxplot provides a visual representation of the distribution of sales figures, highlighting the range, median, and outliers in each location.

We observe that the outliers are high or low sales months as we analyze previously. It confirms the previous analysis and provides a more detailed view of the sales distribution across locations.

3.3.3 Top Performing / Worse Performing Locations

Click to show code

#using grand total to sort the data from top to bottom
df <- df_sales_2023
#using 'grand total' column as total sales plot the top and bottom locations
df %>%
  arrange(desc(`Grand Total`)) %>%
  slice_head(n = 5) %>%
  select(`Row Labels`, `Grand Total`) %>%
  ggplot(aes(x = reorder(`Row Labels`, `Grand Total`), y = `Grand Total`, fill = `Row Labels`)) +
  geom_bar(stat = "identity", show.legend = FALSE, fill = "#33848D", color = "black") +
  labs(title = "Top 5 Performing Locations by Total Sales (2023)", x = "Location", y = "Total Sales") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust=0.5)) # Rotate the x-axis text for better readability

# worse performing locations
df %>%
  arrange(`Grand Total`) %>%
  slice_head(n = 5) %>%
  select(`Row Labels`, `Grand Total`) %>%
  ggplot(aes(x = reorder(`Row Labels`, `Grand Total`), y = `Grand Total`, fill = `Row Labels`)) +
  geom_bar(stat = "identity", show.legend = FALSE, fill = "#33848D", color = "black") +
  labs(title = "Bottom 5 Performing Locations by Total Sales (2023)", x = "Location", y = "Total Sales") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust=0.5)) # Rotate the x-axis text for better readability

As previously analyzed, the top and bottom performing locations are displayed in the bar charts. The top 5 locations with the highest total sales are shown in the first graph, while the bottom 5 locations with the lowest total sales are displayed in the second graph.

Top-performing locations are : Monthey, Sierre, Marin-Epagnier, Vevey, and Sion. Worse-performing locations are : Basel, St. Gall, Sargans, Rapperswil, and Chur.

3.3.4 2022 vs 2023

Click to show code

#plot a bar chart to compare the total sales in 2022 and 2023 and add transparency to the bars
df_merged_sales %>%
  ggplot(aes(x = reorder(Location, -`Total Sales 2023`), y = `Total Sales 2023`, fill = "2023")) +
  geom_bar(aes(x = reorder(Location, -`Total Sales 2022`), y = `Total Sales 2022`, fill = "2022"), stat = "identity", position = "dodge", fill = "#7e57c2", color = "black", alpha = 0.7 ) +
  geom_bar(stat = "identity", position = "dodge", fill = "#33848D", color = "black", alpha = 0.7) +
  labs(title = "Total Sales Comparison Between 2022 and 2023 by Location", x = "Location", y = "Total Sales") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust=0.5)) # Rotate the x-axis text for better readability

Trend of Decline: A significant number of Manor locations have lower sales figures in 2023 compared to 2022. This trend suggests that Lait Equitable might be facing challenges in these areas, which could include increased competition, changing consumer preferences, or other market dynamics affecting the demand for their products.
Monthey’s Decline: The bar chart shows that Monthey experienced a substantial decrease in sales in 2023 compared to 2022. This would be a key point of concern for Lait Equitable, and understanding why Monthey is underperforming is essential. This could be due to a range of factors, such as local economic conditions, operational challenges, increased competition, or changes in consumer preference within that particular area.

3.3.5 Map

We used the datasets about Lait Equitable Sales in order to show each product sales on a map.
We took the producers locations on the fairswiss.ch website in order to display them on a map. We also displayed the sales locations of the Lait Equitable products. The map shows the distribution of Lait Equitable sales locations and the proximity of these locations to the producers. This analysis helps identify potential patterns or correlations between sales and proximity to producers.
A Heatmap was then made in order to see the concentration of producers in Switzerland

You can play with the layer on the top right of the map to see how the sales change between each location for each product

Click to show code

#producers locations
locations = [
      "46.5907889,6.7386114","46.5842792,6.6649234","46.5149217,6.888093","46.5366608,6.7054361","46.2962374,6.9946778","46.5575317,6.7491415","46.5407979,6.7757944","46.5732277,6.7334704","46.6179705,6.6729034","46.4438248,6.9216581","46.5244929,6.7652973","46.5535214,6.7766663","46.9087674,7.0772625","46.5072978,6.7303043","46.7658972,6.5635194","46.4890932,6.839986","46.5343528,6.7394634","46.6558216,6.7369549","46.5156779,6.8621498","46.6112911,6.8073595","46.6666946,6.8991534","46.7395322,6.9715464","46.6468967,6.9540885","46.5153462,6.8584537","46.5975359,7.0382028","46.595883,6.8855935","46.6612814,6.8782794","46.7087666,7.1239746","46.6510812,6.9711981","46.6141324,7.0318659","46.601508,7.1001934","46.6091436,7.0307497","46.5766179,6.8634981","46.5954987,6.9855279","46.829064,6.9264143","46.6588489,6.8758917","46.6124961,6.9757718","47.2915521,7.3217577","47.2775379,7.3830611","47.1202632,6.8834564","47.2786229,7.3611541","47.270679,7.39172","47.3439771,7.4949095","47.3276332,7.218765","47.3660465,7.4060015","47.3649491,7.46977","47.2032746,6.9890961","47.2092332,7.0068912","46.3260628,6.9042257","46.2126721,6.9985419","46.1163846,7.112862","46.3649142,6.8995776","46.9232972,6.4673148","47.0769951,6.756922","46.0377232,8.8837298","46.1646151,8.9659276","46.1292513,8.985035","46.0029796,8.8540675","46.4694663,8.9413945","46.1460954,8.9345517","47.3859362,7.4313549","47.4475168,7.5593599","47.3809253,7.4221544","47.4364108,9.3341963","47.4304657,9.3321608","47.4341264,9.3481821","47.4345287,9.3262813","47.426417,9.3338866","47.4652773,9.1518866","47.2749047,9.5092842","47.1135124,9.1297439","47.404214,9.3419733","47.5295845,8.4670763","47.1738126,8.3342979","47.08052,8.2031428","47.2351478,8.1477158","46.7883954,7.4004079","47.0364909,7.2786631","46.8364949,7.7665098","46.7833635,7.3912478"]

Click to show code

# Function to calculate the dynamic radius
def calculate_radius(volume, max_volume, min_volume, max_radius=20):
    normalized_volume = (volume - min_volume) / (max_volume - min_volume)
    return normalized_volume * max_radius + 3

# Function to get latitude and longitude
def get_lat_lon(city):
    try:
        time.sleep(1)  # Simple rate-limiting mechanism
        location = geolocator.geocode(city + ', Switzerland')
        return location.latitude, location.longitude
    except AttributeError:
        return None, None

# Read data from different product categories
file_paths = {
    'All Products': ("../data/Produits laitiers équitables - 2023.xlsb", 'Par SM'),
    'Milk Drink': ("../data/lait_drink_sales_per_stores_2023.xlsx", 'Sheet1'),
    'Milk Entier': ("../data/lait_entier_sales_per_stores_2023.xlsx", 'Sheet1'),
    'Fondue': ("../data/fondue_sales_per_stores_2023.xlsx", 'Sheet1'),
    'Delice': ("../data/delice_sales_per_stores_2023.xlsx", 'Sheet1'),
    'Creme': ("../data/creme_cafe_sales_per_stores_2023.xlsx", 'Sheet1')
}

# Create a folium map
m = folium.Map(location=[46.8182, 8.2275], zoom_start=8)
# Instantiate the geolocator
geolocator = Nominatim(user_agent="le_stores")

#####map for store locations per products
# Loop through each category 
for category, (file_path, sheet_name) in file_paths.items():
    engine = 'pyxlsb' if 'xlsb' in file_path else None
    df = pd.read_excel(file_path, engine=engine, sheet_name=sheet_name)

    if category == 'All Products':
        # Skip the first six rows and rename columns based on the provided structure
        df = df.iloc[6:]  
        df.rename(columns={
            'Quantités vendues - année 2023': 'City',
            'Unnamed: 1': '01/01/2023',
            'Unnamed: 2': '02/01/2023',
            'Unnamed: 3': '03/01/2023',
            'Unnamed: 4': '04/01/2023',
            'Unnamed: 5': '05/01/2023',
            'Unnamed: 6': '06/01/2023',
            'Unnamed: 7': '07/01/2023',
            'Unnamed: 8': '08/01/2023',
            'Unnamed: 9': '09/01/2023',
            'Unnamed: 10': '10/01/2023',
            'Unnamed: 11': '11/01/2023',
            'Unnamed: 12': '12/01/2023',
            'Unnamed: 13': 'Total General'
        }, inplace=True)
    else:
        # Renaming columns for XLSX files based on your last dataframe example
        df.rename(columns={
            df.columns[0]: 'City',
            df.columns[-1]: 'Total General'
        }, inplace=True)

    # Standardize city names
    correct_city_names = {
        'Bâle': 'Basel',
        'Genève': 'Geneva',
        'Bienne': 'Biel/Bienne',
        'Chavannes': 'Chavannes-de-Bogis',
        'Marin': 'Marin-Epagnier',
        'Vesenaz': 'Vésenaz',
        'Yverdon': 'Yverdon-les-Bains',
        'Saint-Gall Webersbleiche': 'St. Gall'
    }
    df['City'] = df['City'].apply(lambda x: correct_city_names.get(x, x))

    # Get latitudes and longitudes
    df[['Lat', 'Lon']] = df.apply(lambda row: pd.Series(get_lat_lon(row['City'])), axis=1)

    # Define color scale and feature group
    max_sales = df['Total General'].max()
    min_sales = df['Total General'].min()
    color_scale = cmp.linear.viridis.scale(min_sales, max_sales)
    fg = folium.FeatureGroup(name=category)
    
    # Add markers
    for index, row in df.iterrows():
        if pd.notnull(row['Lat']) and pd.notnull(row['Lon']):
            radius = calculate_radius(row['Total General'], max_sales, min_sales)
            folium.CircleMarker(
                location=[row['Lat'], row['Lon']],
                radius=radius,
                popup=f"{row['City']}: {row['Total General']}",
                color=color_scale(row['Total General']),
                fill=True,
                fill_color=color_scale(row['Total General'])
            ).add_to(fg)

    fg.add_to(m)
#> <folium.vector_layers.CircleMarker object at 0x0000028DF567D070>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF78B1EB0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF788FAD0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DE3DA3620>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7841AC0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DE2534800>
#> <folium.vector_layers.CircleMarker object at 0x0000028DE228CD70>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF50A7F80>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF5446870>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF56C0A70>
#> <folium.vector_layers.CircleMarker object at 0x0000028DE3CE54F0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7DE3F80>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7DE1610>
#> <folium.vector_layers.CircleMarker object at 0x0000028DE3D65070>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7DE0A40>
#> <folium.vector_layers.CircleMarker object at 0x0000028DE3CE36B0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DE3CE1100>
#> <folium.vector_layers.CircleMarker object at 0x0000028DE3CE32C0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7E23770>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7E21E80>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7E22FF0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7E22E40>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7E213A0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7E21FA0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7E200B0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7E21C10>
#> <folium.map.FeatureGroup object at 0x0000028DF565EE10>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF78B22D0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DE3CEE540>
#> <folium.vector_layers.CircleMarker object at 0x0000028DE3CE3530>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7EA9940>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7EAA840>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7EA9760>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7EA94C0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7EA95B0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7EAB1D0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7EA9910>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7EA9EE0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7EAB170>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7EA9E50>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7EAA090>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7EAA1E0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7EAA270>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7EAA3F0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7EAA870>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7EAA600>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7EAAE10>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7EAAC30>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7EAAC60>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7EAAA20>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7EAAF30>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7EAB0E0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7EAB830>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7EABCE0>
#> <folium.map.FeatureGroup object at 0x0000028DF78B0410>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF78B2600>
#> <folium.vector_layers.CircleMarker object at 0x0000028DE2AEAAB0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7EE1DF0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7EA9880>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7EA8FE0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7EA84A0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7EA83E0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F0EED0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F0F5C0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F0EAB0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F0E450>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F0E5A0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F0E6C0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F0FAA0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F0ECF0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F0EC30>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F0E9F0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F0EC60>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F0F020>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F0F320>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F0F500>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F0F0E0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F0F7D0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F0F530>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F0F9E0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F0F6E0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F0F620>
#> <folium.map.FeatureGroup object at 0x0000028DF55956D0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7E21820>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7E202F0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7E209B0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7E21970>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7E23050>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7E22900>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7E218E0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF569FF80>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF55960F0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DE2AE8DD0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7EE0800>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F0D7F0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F0DA60>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F0FA40>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F0DA90>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F0DD60>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F0DC70>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F0C4D0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F0D700>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F0D310>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F0D2E0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F0D4F0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F0D2B0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F0C5F0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F0CD40>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F0C8C0>
#> <folium.map.FeatureGroup object at 0x0000028DF56FA540>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7DC7020>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF50A6F30>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF56C2570>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7EAB9E0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7EA8DD0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7E21130>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7EE03B0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF788FC50>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F59B50>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F59910>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F5A120>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F5A4B0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F5A660>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F5A0C0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F5A330>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F59E50>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F5A000>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F5AF00>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F5B110>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F5A480>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F5AFF0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F5AE40>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F5A690>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F5A8A0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F5AC30>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F5AF60>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F5AF30>
#> <folium.map.FeatureGroup object at 0x0000028DF7DE34D0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F5B1A0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F5B6E0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F59520>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7E94A10>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7E94A70>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F0FFE0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7EE0560>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF7F586E0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF81D2570>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF81D2AE0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF81D2F00>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF81D2C30>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF81D2690>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF81D31D0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF81D2C90>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF81D2A20>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF81D2720>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF81D2BD0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF81D3260>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF81D3A10>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF81D3140>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF81D3E30>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF81D3860>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF81D3950>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF81D3380>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF81D34D0>
#> <folium.vector_layers.CircleMarker object at 0x0000028DF81D39E0>
#> <folium.map.FeatureGroup object at 0x0000028DF7C8CFE0>
###heatmap of producers
heat_data = [[float(lat), float(lon)] for loc in locations for lat, lon in [loc.split(',')]]

# Add HeatMap layer
HeatMap(heat_data).add_to(m)
#> <folium.plugins.heat_map.HeatMap object at 0x0000028DE2AFD7F0>

# Add layer control and save the map
folium.LayerControl().add_to(m)
#> <folium.map.LayerControl object at 0x0000028DE3CEE6F0>
m.save('combined_product_map.html')
m

Make this Notebook Trusted to load map: File -> Trust Notebook

This map shows that a high concentration of producers is located in the central-west of Switzerland, particularly around the cities of Lausanne, Fribourg, and Bern. The sales locations of Lait Equitable products are more widespread, with a higher density in the western part of the country, including cities like Geneva, Lausanne, and Vevey. This correlation could suggest a strategic advantage in terms of sourcing and distribution efficiency in these areas. The map provides a visual representation of the proximity between producers and sales locations, highlighting potential areas for strategic partnerships or marketing initiatives.

The south and the east of Switzerland have fewer producers and sales locations, indicating potential opportunities for expansion or targeted marketing efforts in these regions.

4 Analysis

4.1 Forecasting

4.1.1 Decomposition of Milk Price in Switzerland

Click to show code

# Charger les packages nécessaires
library(tidyverse)
library(lubridate)
library(forecast)
library(ggplot2)

# Importer les données
swiss_decomposition <- read_csv('../data/swiss_production_data.csv', show_col_types = FALSE)

Click to show code

# Assurez-vous que les dates sont de type Date et que 'Date' est l'index
swiss_decomposition <- swiss_decomposition %>%
  mutate(Date = as.Date(Date, format = "%Y-%m-%d")) %>%
  arrange(Date)

# Calculez la moyenne des prix par date
data <- swiss_decomposition %>%
  group_by(Date = floor_date(Date, "month")) %>%
  summarise(Price = mean(Price, na.rm = TRUE))

# Convertir en série temporelle
data_ts <- ts(data$Price, start = c(year(min(data$Date)), month(min(data$Date))), frequency = 12)

# Appliquez la décomposition classique
decomposition <- decompose(data_ts, type = "additive")  # ou 'multiplicative' selon le cas

# Créez un graphique avec des subplots
par(mfrow = c(4, 1), mar = c(3, 3, 2, 1), oma = c(1, 1, 1, 1))

# Plot the original data
plot(data_ts, main = "Original Data", ylab = "Prix")

# Plot the trend component
plot(decomposition$trend, main = "Trend Component", ylab = "Trend")

# Plot the seasonal component
plot(decomposition$seasonal, main = "Seasonal Component", ylab = "Seasonal")

# Plot the residual component
plot(decomposition$random, main = "Residual Component", ylab = "Residual")

# Ajuster le layout
par(mfrow = c(1, 1))

4.1.2 Overall SARIMA forecast

Click to show code

# Assurez-vous que les dates sont de type Date et que 'Date' est l'index
library(dplyr)
library(lubridate)
library(forecast)
library(ggplot2)

swiss_decomposition <- swiss_decomposition %>%
  mutate(Date = as.Date(Date, format = "%Y-%m-%d")) %>%
  arrange(Date)

# Calculez la moyenne des prix par date (par mois)
data <- swiss_decomposition %>%
  group_by(Date) %>%
  summarise(Price = mean(Price, na.rm = TRUE), .groups = 'drop')

# Convertir en série temporelle
data_ts <- ts(data$Price, start = c(year(min(data$Date)), month(min(data$Date))), frequency = 12)

# Ajustement du modèle SARIMA avec auto.arima pour sélectionner les meilleurs paramètres
sarima_model <- auto.arima(data_ts, seasonal = TRUE, stepwise = FALSE, approximation = FALSE)

# Prévision avec intervalles de confiance de 80 % et 95 %
sarima_forecast <- forecast(sarima_model, h = 12, level = c(80, 95))

# Créer des dataframes pour les prévisions et les intervalles de confiance
data_forecast <- data.frame(Date = seq.Date(from = max(data$Date) + months(1), 
                                            by = "month", length.out = 12),
                            Forecast = as.numeric(sarima_forecast$mean),
                            Lower80 = sarima_forecast$lower[,1],
                            Upper80 = sarima_forecast$upper[,1],
                            Lower95 = sarima_forecast$lower[,2],
                            Upper95 = sarima_forecast$upper[,2])

# Tracer les résultats avec ggplot2
ggplot() +
  geom_line(data = data, aes(x = Date, y = Price), color = "blue", size = 0.5) +
  geom_line(data = data_forecast, aes(x = Date, y = Forecast), color = "red", size = 0.5) +
  geom_ribbon(data = data_forecast, aes(x = Date, ymin = Lower80, ymax = Upper80), fill = "blue", alpha = 0.2) +
  geom_ribbon(data = data_forecast, aes(x = Date, ymin = Lower95, ymax = Upper95), fill = "blue", alpha = 0.1) +
  scale_x_date(date_breaks = "5 years", date_labels = "%Y") +
  labs(title = 'Overall SARIMA Forecast', x = 'Date', y = 'Average Price') +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))
#> Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
#> i Please use `linewidth` instead.

The SARIMA model used provides a good approximation of historical trends and provides reasonable forecasts for 2024. However, there are a few points to consider: - relative price stability in the forecast despite historical variability. - confidence intervals show that there is some uncertainty around forecasts, indicating that it is prudent to consider a range of possible outcomes.

4.1.3 SARIMA Forecast for “Bio” and “Conventional”

We saw that our data showed a significant seasonal trend. The SARIMA model takes into account the seasonality of time series in its predictions. This is why the SARIMA model would be better suited to our data.

Click to show code

# Assurez-vous que les dates sont de type Date et que 'Date' est l'index
swiss_decomposition <- swiss_decomposition %>%
  mutate(Date = as.Date(Date, format = "%Y-%m-%d")) %>%
  arrange(Date)

# Filtrer les données pour les systèmes de production 'Bio' et 'Conventional'
data_bio <- swiss_decomposition %>%
  filter(`System of production` == 'Bio') %>%
  group_by(Date = floor_date(Date, "month")) %>%
  summarise(Price = mean(Price, na.rm = TRUE)) %>%
  ungroup()

data_conventional <- swiss_decomposition %>%
  filter(`System of production` == 'Conventional') %>%
  group_by(Date = floor_date(Date, "month")) %>%
  summarise(Price = mean(Price, na.rm = TRUE)) %>%
  ungroup()

# Convertir en série temporelle
ts_bio <- ts(data_bio$Price, start = c(year(min(data_bio$Date)), month(min(data_bio$Date))), frequency = 12)
ts_conventional <- ts(data_conventional$Price, start = c(year(min(data_conventional$Date)), month(min(data_conventional$Date))), frequency = 12)

# Ajuster les modèles SARIMA
fit_bio_sarima <- auto.arima(ts_bio, seasonal = TRUE, stepwise = FALSE, approximation = FALSE)
fit_conventional_sarima <- auto.arima(ts_conventional, seasonal = TRUE, stepwise = FALSE, approximation = FALSE)

# Prévisions pour les 12 prochains mois
forecast_bio_sarima <- forecast(fit_bio_sarima, h = 12)
forecast_conventional_sarima <- forecast(fit_conventional_sarima, h = 12)

# Tracer les prévisions
autoplot(forecast_bio_sarima) +
  labs(title = "Forecasted Prices of Organic Milk (SARIMA)", x = "Date", y = "Average Price") +
  theme_minimal()

autoplot(forecast_conventional_sarima) +
  labs(title = "Forecasted Prices of Non-Organic Milk (SARIMA)", x = "Date", y = "Average Price") +
  theme_minimal()

For Organic Milk, price forecasts show a stable but slightly rising general trend. For Conventional Milk, price forecast shows a slight initial decline followed by stabilization. Confidence intervals for both types of milk indicate similar uncertainty around forecasts, with areas of moderate and high uncertainty.

4.1.4 Exponential smoothing

Click to show code

# Charger les packages nécessaires
library(tidyverse)
library(lubridate)
library(forecast)
library(ggplot2)

# Filtrer les données pour les systèmes de production 'Bio' et 'Conventional'
data_bio <- swiss_decomposition %>%
  filter(`System of production` == 'Bio') %>%
  group_by(Date = floor_date(Date, "month")) %>%
  summarise(Price = mean(Price, na.rm = TRUE)) %>%
  ungroup()

data_conventional <- swiss_decomposition %>%
  filter(`System of production` == 'Conventional') %>%
  group_by(Date = floor_date(Date, "month")) %>%
  summarise(Price = mean(Price, na.rm = TRUE)) %>%
  ungroup()

# Convertir en série temporelle
ts_bio <- ts(data_bio$Price, start = c(year(min(data_bio$Date)), month(min(data_bio$Date))), frequency = 12)
ts_conventional <- ts(data_conventional$Price, start = c(year(min(data_conventional$Date)), month(min(data_conventional$Date))), frequency = 12)

# Ajuster les modèles de lissage exponentiel (ETS)
fit_bio_ets <- ets(ts_bio)
fit_conventional_ets <- ets(ts_conventional)

# Prévisions pour les 12 prochains mois
forecast_bio_ets <- forecast(fit_bio_ets, h = 12)
forecast_conventional_ets <- forecast(fit_conventional_ets, h = 12)

# Tracer les prévisions
autoplot(forecast_bio_ets) +
  labs(title = "Forecasted Prices of Organic Milk (ETS)", x = "Date", y = "Average Price") +
  theme_minimal()

autoplot(forecast_conventional_ets) +
  labs(title = "Forecasted Prices of Non-Organic Milk (ETS)", x = "Date", y = "Average Price") +
  theme_minimal()

For organic milk, the price forecasts show a stable overall trend, with a slight increase. For non organic milk, the price Forecasts show a slight initial fall, followed by stabilization.

4.1.5 Holt-Winters

We apply the Holt-Winters model to predict the price over the next 12 months. This model is an extension of the simple exponential smoothing model. It takes trend and seasonality into account in its predictions for its two time series, since these two components play a crucial role as we have seen with the decomposition. We took seasonal = “multiplicative” because the multiplicative model shows lower AIC values, indicating a better fit than additive models.

Click to show code

# Charger les packages nécessaires
library(forecast)
library(ggplot2)

# Convertir en série temporelle
ts_bio <- ts(data_bio$Price, start = c(year(min(data_bio$Date)), month(min(data_bio$Date))), frequency = 12)
ts_conventional <- ts(data_conventional$Price, start = c(year(min(data_conventional$Date)), month(min(data_conventional$Date))), frequency = 12)

# Ajuster le modèle de Holt-Winters (multiplicatif pour la saisonnalité multiplicative)
fit_bio_hw <- hw(ts_bio, seasonal = "multiplicative")
fit_conventional_hw <- hw(ts_conventional, seasonal = "multiplicative")

# Prévisions pour les 12 prochains mois
forecast_bio_hw <- forecast(fit_bio_hw, h = 12)
forecast_conventional_hw <- forecast(fit_conventional_hw, h = 12)

# Tracer les prévisions
autoplot(forecast_bio_hw) +
  labs(title = "Forecasted Prices of Organic Milk (Holt-Winters Multiplicative)", x = "Date", y = "Average Price") +
  theme_minimal()

autoplot(forecast_conventional_hw) +
  labs(title = "Forecasted Prices of Non-Organic Milk (Holt-Winters Multiplicative)", x = "Date", y = "Average Price") +
  theme_minimal()

The Holt-Winters model predicts a higher price increase for organic milk. According to this graph, the price would rise to CHF 1 for the producer. The confidence interval rises to CHF 1.10, but does not fall below 87 centimes. For conventional milk, the forecasts fall less and rise more compared with the ETS model. For organic milk, the price forecasts show a stable overall trend, with a slight increase. For non organic milk, the price Forecasts show a slight initial fall, followed by stabilization.

4.1.6 Comparison of all forecasting models

All four models effectively capture historical trends in organic and non-organic milk prices. Forecasts for 2024 are similar between the models, indicating upward trends for organic milk and stabilization for non-organic milk. Confidence intervals for all four models are similar, indicating moderate to high uncertainty. We observe that there are no major differences between the SARIMA, SARIMAX, ETS, and Holt-Winters Multiplicative models in the organic and non-organic milk price forecasts for 2024.

4.2 Lait Equitable Analysis

4.2.1 Pareto Principle

The Pareto Principle, often known as the 80/20 rule, asserts that a small proportion of causes, inputs, or efforts usually lead to a majority of the results, outputs, or rewards. Applied to a business context where approximately 20% of the sales account for 80% of the revenues, this principle can help in identifying and focusing on the most profitable aspects of a business.

Evidence from Research:

Sales and Customer Concentration: Research has consistently shown that a significant portion of sales often comes from a minority of customers or products. For instance, an analysis across 22 different consumer packaged goods categories found an average Pareto ratio (PR) of .73, indicating that the top proportion of products/customers often account for a disproportionately high share of sales or profits Source - Kim, Singh, & Winer, 2017

Decision Making and Resource Allocation: The Pareto Principle helps in decision-making by highlighting areas where the greatest impact can be achieved. For example, focusing on the top-performing products or customers can optimize resource allocation and maximize profits Source - Ivančić, 2014

Market and Profit Concentration: Another study noted that a small number of customers are often responsible for a large portion of sales, which supports the strategic focus on these customers to boost profitability and efficiency Source- McCarthy & Winer, 2018

Conclusion: Applying the Pareto Principle in a business context where a minority of sales drives the majority of revenue can lead to more focused and effective business strategies, optimizing efforts towards the most profitable segments. This approach not only simplifies decision-making but also enhances resource allocation, ultimately leading to increased profitability.

4.2.1.1 Steps

Calculating the total sales across all locations for both 2022 and 2023.
Ranking locations by sales to see the cumulative contribution of each location towards the total.
Identifying the point where approximately 20% of the locations contribute to around 80% of the sales.

Click to show code

#create combine Combined_Sales data by adding col 'Total Sales 2022' and 'Total Sales 2023'
df_merged_sales <- df_merged_sales %>%
  mutate(Combined_Sales = `Total Sales 2022` + `Total Sales 2023`)

# Calculate the total combined sales of all locations
total_combined_sales <- sum(df_merged_sales$Combined_Sales)

# Calculate the percentage contribution of each location
percentage_contributions <- df_merged_sales %>%
  mutate(Percentage_Contribution = (Combined_Sales / total_combined_sales) * 100)
#sort ascending by percentage contribution
percentage_contributions <- percentage_contributions %>%
  arrange(desc(Percentage_Contribution))

# Sort the data by Percentage_Contribution in descending order
percentage_contributions <- percentage_contributions %>%
  arrange(desc(Percentage_Contribution)) %>%
  mutate(Cumulative_Percentage = cumsum(Percentage_Contribution))

We will now identify a percentage of location that accounts for 80% of the sales.

Click to show code

# Sort the data by Percentage_Contribution in descending order
percentage_contributions <- percentage_contributions %>%
  arrange(desc(Percentage_Contribution)) %>%
  mutate(
    Cumulative_Percentage = cumsum(Percentage_Contribution),
    Location_Count = row_number()
  )

# Total number of locations
total_locations <- nrow(percentage_contributions)

# Find the smallest number of locations contributing to approximately 80% of sales
eighty_percent_point <- min(percentage_contributions$Location_Count[percentage_contributions$Cumulative_Percentage >= 80])

# Percentage of locations contributing to 80% of sales
percentage_of_locations <- eighty_percent_point / total_locations * 100

# Create the Pareto chart
pareto_chart <- ggplot(percentage_contributions, aes(x = reorder(Location, -Percentage_Contribution), y = Percentage_Contribution)) +
  geom_bar(stat = "identity", aes(fill = Percentage_Contribution)) +
  scale_fill_viridis(option = "D", direction = -1) +
  geom_line(aes(y = Cumulative_Percentage), group = 1, color = "skyblue", size = 1.5) +
  geom_vline(xintercept = eighty_percent_point, linetype = "dashed", color = "blue") +
  geom_hline(yintercept = 80, linetype = "dashed", color = "blue") +
  scale_y_continuous(sec.axis = sec_axis(~ ., name = "Cumulative Percentage")) +
  labs(title = "Pareto Chart of Sales Contributions by Location",
       subtitle = sprintf("Approx. %.1f%% of locations contribute to 80%% of sales", percentage_of_locations),
       x = "Location",
       y = "Percentage Contribution") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

# Print the chart
pareto_chart

Given this graph the top locations in terms of sales are contributing to a heavy percentage of the total sales of 2022 and 2023.

The top 33.3% of locations based on their cumulative sales contribution account for 80% of all the sales.

The top-performing 33.3% of Manor locations that contribute to the majority of sales are: ::: {.cell layout-align=“center”}

Click to show code

percentage_contributions
#>              Location Total Sales 2022 Total Sales 2023
#> 1             Monthey            49965            39803
#> 2              Sierre            32212            29549
#> 3      Marin-Epagnier            30228            26921
#> 4               Vevey            25720            23124
#> 5              Geneva            16068            15091
#> 6                Sion            15208            15159
#> 7            Fribourg            14792            14209
#> 8            Lausanne            13186            14869
#> 9  Chavannes-de-Bogis            14359            12797
#> 10  Yverdon-les-Bains            10919            10287
#> 11             Morges             8424             7940
#> 12        Biel/Bienne             6611             5998
#> 13            Vésenaz             4314             5027
#> 14           Delémont             4131             3782
#> 15               Nyon             3863             3370
#> 16       Ascona-Delta             2064             1797
#> 17             Lugano             1771             1458
#> 18              Vezia             1464             1680
#> 19              Emmen             1107             1170
#> 20            Balerna              827             1064
#> 21       San Antonino              935              948
#> 22              Basel             1002              558
#> 23              Baden              830              571
#> 24           St. Gall              848              490
#> 25            Sargans              884              431
#> 26               Chur              481              327
#> 27         Rapperswil              377              391
#>    Combined_Sales Percentage_Contribution Cumulative_Percentage
#> 1           89768                  17.903                  17.9
#> 2           61761                  12.318                  30.2
#> 3           57149                  11.398                  41.6
#> 4           48844                   9.741                  51.4
#> 5           31159                   6.214                  57.6
#> 6           30367                   6.056                  63.6
#> 7           29001                   5.784                  69.4
#> 8           28055                   5.595                  75.0
#> 9           27156                   5.416                  80.4
#> 10          21206                   4.229                  84.7
#> 11          16364                   3.264                  87.9
#> 12          12609                   2.515                  90.4
#> 13           9341                   1.863                  92.3
#> 14           7913                   1.578                  93.9
#> 15           7233                   1.443                  95.3
#> 16           3861                   0.770                  96.1
#> 17           3229                   0.644                  96.7
#> 18           3144                   0.627                  97.4
#> 19           2277                   0.454                  97.8
#> 20           1891                   0.377                  98.2
#> 21           1883                   0.376                  98.6
#> 22           1560                   0.311                  98.9
#> 23           1401                   0.279                  99.2
#> 24           1338                   0.267                  99.4
#> 25           1315                   0.262                  99.7
#> 26            808                   0.161                  99.8
#> 27            768                   0.153                 100.0
#>    Location_Count
#> 1               1
#> 2               2
#> 3               3
#> 4               4
#> 5               5
#> 6               6
#> 7               7
#> 8               8
#> 9               9
#> 10             10
#> 11             11
#> 12             12
#> 13             13
#> 14             14
#> 15             15
#> 16             16
#> 17             17
#> 18             18
#> 19             19
#> 20             20
#> 21             21
#> 22             22
#> 23             23
#> 24             24
#> 25             25
#> 26             26
#> 27             27
# Find the cutoff point where the cumulative percentage reaches or just exceeds 80%
cutoff_index <- which(percentage_contributions$Cumulative_Percentage >= 80)[1]

# Filter the data to include only the locations contributing to 80% of the sales
locations_80_percent <- percentage_contributions[1:cutoff_index, ] %>%
  select(Location, `Total Sales 2022`, `Total Sales 2023`, Combined_Sales, Percentage_Contribution)
locations_80_percent
#>             Location Total Sales 2022 Total Sales 2023
#> 1            Monthey            49965            39803
#> 2             Sierre            32212            29549
#> 3     Marin-Epagnier            30228            26921
#> 4              Vevey            25720            23124
#> 5             Geneva            16068            15091
#> 6               Sion            15208            15159
#> 7           Fribourg            14792            14209
#> 8           Lausanne            13186            14869
#> 9 Chavannes-de-Bogis            14359            12797
#>   Combined_Sales Percentage_Contribution
#> 1          89768                   17.90
#> 2          61761                   12.32
#> 3          57149                   11.40
#> 4          48844                    9.74
#> 5          31159                    6.21
#> 6          30367                    6.06
#> 7          29001                    5.78
#> 8          28055                    5.60
#> 9          27156                    5.42
# Create a reactable table
reactable_table <- reactable(locations_80_percent, columns = list(
  Location = colDef(name = "Location", sortable = TRUE, searchable = TRUE),
  'Total Sales 2022' = colDef(name = "Total Sales 2022"),
  'Total Sales 2023' = colDef(name = "Total Sales 2023"),
  Combined_Sales = colDef(name = "Combined Sales")), 
  highlight = TRUE,  # Highlight rows on hover
  defaultPageSize = 10,  # Display 10 rows per page
  paginationType = "numbers",  # Use numbers for page navigation
  searchable = TRUE,  # Make the table searchable
  sortable = TRUE,  # Allow sorting
  resizable = TRUE  # Allow column resizing
)
# Output the reactable table
reactable_table

:::

4.2.2 Understanding Success Factors of Top-Performing Stores

4.2.2.1 Correlating Political Parties with Milk Sales

Here, we will then make a scatterplot to identify if there is any correlation between any political party and sales of lait equitable. Our aim is to show that there might be a link with milk sales and a certain political party: are the sales correlated to a certain party presence? ::: {.cell layout-align=“center”}

Click to show code

# Calculate correlation coefficients for each party
correlation_df <- data.frame(Party = c("PLR", "PS", "UDC", "Centre", "Verts", "Vertliberaux"),
                             Correlation = sapply(merged_data[, 4:9], function(x) cor(x, merged_data$`2023`)))

# Print the correlation coefficients
print(correlation_df)
#>                     Party Correlation
#> PLR                   PLR      0.2796
#> PS                     PS      0.1853
#> UDC                   UDC     -0.1933
#> Centre             Centre      0.0375
#> Verts               Verts      0.1708
#> Vertliberaux Vertliberaux     -0.3715

# Create a matrix of plots for each party
party_plots <- lapply(names(merged_data)[4:9], function(party) {
  ggplot(merged_data, aes_string(x = "`2023`", y = party)) +
    geom_point() +
    geom_smooth(method = "lm", se = FALSE, color = "blue") +
    labs(x = "Annual Sales", y = paste(party, "Party Presence (%)"), title = paste("Correlation:", party, "Party vs. Sales")) +
    theme_minimal()
})
#> Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
#> i Please use tidy evaluation idioms with `aes()`.
#> i See also `vignette("ggplot2-in-packages")` for more information.

# Arrange the plots in a matrix layout
matrix_plot <- gridExtra::grid.arrange(grobs = party_plots, ncol = 2)
matrix_plot
#> TableGrob (3 x 2) "arrange": 6 grobs
#>   z     cells    name           grob
#> 1 1 (1-1,1-1) arrange gtable[layout]
#> 2 2 (1-1,2-2) arrange gtable[layout]
#> 3 3 (2-2,1-1) arrange gtable[layout]
#> 4 4 (2-2,2-2) arrange gtable[layout]
#> 5 5 (3-3,1-1) arrange gtable[layout]
#> 6 6 (3-3,2-2) arrange gtable[layout]

:::

We can now notice that there is no specific correlation between a certain political party and sales of Lait équitable.

We will therefore proceed to select only the Locations where a political party has more than 20% presence, and then sum all the sales per political party, to see which party has the most influence over sales.

Click to show code


# Read sales data from data.qmd
sales_data_2023 <- sales_data %>%
  select(-`2022`)

# Filter party data to keep only values above 20
filtered_party_data <- party_data %>%
  filter(PLR > 20 | PS > 20 | UDC > 20 | Centre > 20 | Verts > 20 | Vertliberaux > 20)

# Create separate datasets for each political party
plr_data <- filtered_party_data %>%
  filter(PLR > 20) %>%
  select(Location, PLR)

ps_data <- filtered_party_data %>%
  filter(PS > 20) %>%
  select(Location, PS)

udc_data <- filtered_party_data %>%
  filter(UDC > 20) %>%
  select(Location, UDC)

centre_data <- filtered_party_data %>%
  filter(Centre > 20) %>%
  select(Location, Centre)

verts_data <- filtered_party_data %>%
  filter(Verts > 20) %>%
  select(Location, Verts)

vertliberaux_data <- filtered_party_data %>%
  filter(Vertliberaux > 20) %>%
  select(Location, Vertliberaux)

# Merge each party's data with sales data for 2023
plr_sales <- merge(sales_data_2023, plr_data, by.x = "Location")
ps_sales <- merge(sales_data_2023, ps_data, by.x = "Location")
udc_sales <- merge(sales_data_2023, udc_data, by.x = "Location")
centre_sales <- merge(sales_data_2023, centre_data, by.x = "Location")
verts_sales <- merge(sales_data_2023, verts_data, by.x = "Location")
vertliberaux_sales <- merge(sales_data_2023, vertliberaux_data, by.x = "Location")

# Calculate total sales for each party
plr_total_sales <- sum(plr_sales$`2023`)
ps_total_sales <- sum(ps_sales$`2023`)
udc_total_sales <- sum(udc_sales$`2023`)
centre_total_sales <- sum(centre_sales$`2023`)
verts_total_sales <- sum(verts_sales$`2023`)

# Create a data frame for total sales by party
total_sales_df <- data.frame(Party = c("PLR", "PS", "UDC", "Centre", "Verts"),
                             Total_Sales = c(plr_total_sales, ps_total_sales, udc_total_sales,
                                             centre_total_sales, verts_total_sales))

# Define colors for each party
party_colors <- c("PLR" = "blue", "PS" = "red", "UDC" = "darkgreen", "Centre" = "orange", "Verts" = "green")

# Sort the data frame by Total_Sales in descending order
total_sales_df <- total_sales_df[order(-total_sales_df$Total_Sales), ]

# Plot total sales by party in descending order with specified colors
ggplot(total_sales_df, aes(x = reorder(Party, -Total_Sales), y = Total_Sales, fill = Party)) +
  geom_bar(stat = "identity") +
  labs(title = "Total Sales by Political Party in 2023", x = "Party", y = "Total Sales") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  scale_fill_manual(values = party_colors)

We can see that there actually is a lot of sales of lait équitable where PS is present more than 20%

4.2.2.2 Correlating average revenue with Milk Sales

Now, we want to see if there is some correlation between the income per taxpayer of a commune and its sales. ::: {.cell layout-align=“center”}

Click to show code

# Create a scatterplot
ggplot(merged_df, aes(x = `Revenu/contribuable`, y = `2022`)) +
  geom_point(aes(color = Location)) +
  labs(x = "Revenu/contribuable", y = "Sales 2022", title = "Relationship between Revenu/contribuable and Sales in 2022") +
  geom_smooth(method = "lm", se = FALSE) +
  theme_minimal()

# Create another scatterplot for 2023
ggplot(merged_df, aes(x = `Revenu/contribuable`, y = `2023`)) +
  geom_point(aes(color = Location)) +
  labs(x = "Revenu/contribuable", y = "Sales 2023", title = "Relationship between Revenu/contribuable and Sales in 2023") +
  geom_smooth(method = "lm", se = FALSE) +
  theme_minimal()

::: Here we can again see that there is no specific correlation, and that the correlation is even negative. We will then move on to our last part where we will try to correlate the sales of lait équitable with the proximity of milk producers.

4.2.2.3 Correlating Distance to Producers

Does being close to a Producer influence Sales ? What we do here is to calculate a distance matrix in order to determine the proximity of each city to the producers. We then analyze the correlation between the total sales and the minimum distance to the producer. This analysis helps us understand if the proximity to the producer has any significant impact on the sales of Lait Equitable products.

The distance Matrix is shown as follow :

Click to show code

from geopy.distance import geodesic
latitudes = []
longitudes = []

# Parse each location and extract latitude and longitude
for location in locations:
    lat, lon = location.split(',')
    latitudes.append(float(lat))
    longitudes.append(float(lon))

# Create a DataFrame using pandas
producers = pd.DataFrame({
    'Latitude': latitudes,
    'Longitude': longitudes
})

#get the df from the python code
cities = df['City']

# Initialize an empty DataFrame to store distances
distance_matrix = pd.DataFrame(index=cities, columns=[f"Producer {i+1}" for i in range(len(producers))])

# Calculate distances and fill the DataFrame
for city in cities:
    city_lat, city_lon = get_lat_lon(city)
    if city_lat is not None and city_lon is not None:
        city_coords = (city_lat, city_lon)
        for index, producer in producers.iterrows():
            producer_coords = (producer['Latitude'], producer['Longitude'])
            distance = geodesic(city_coords, producer_coords).kilometers  # distance in kilometers
            distance_matrix.loc[city, f"Producer {index+1}"] = distance
            

# Flatten the DataFrame to get all distance values in one series
all_distances = distance_matrix.values.flatten()

Click to show code

#get the distance_matrix in r
distance_matrix <- py$distance_matrix

#use reactable to show the table 
#show it using reactable
reactable(
  distance_matrix,  
  highlight = TRUE,  # Highlight rows on hover
  defaultPageSize = 10,  # Display 10 rows per page
  paginationType = "numbers",  # Use numbers for page navigation
  searchable = TRUE,  # Make the table searchable
  sortable = TRUE,  # Allow sorting
  resizable = TRUE  # Allow column resizing
)

We can see here the distribution of distances between cities and producers. The histogram shows the frequency of distances between cities and producers, providing insights into the geographical distribution of producers and their proximity to cities.

Statistic	Distance
Minimum Distance	0.68 km
Maximum Distance	283.46 km
Average Distance	116.73 km
Median Distance	117.62 km

Click to show code

# Basic statistics
print("Distance Statistics:")
#> Distance Statistics:
print("Minimum Distance: {:.2f} km".format(all_distances.min()))
#> Minimum Distance: 0.68 km
print("Maximum Distance: {:.2f} km".format(all_distances.max()))
#> Maximum Distance: 283.30 km
print("Average Distance: {:.2f} km".format(all_distances.mean()))
#> Average Distance: 116.73 km
print("Median Distance: {:.2f} km".format(np.median(all_distances)))
#> Median Distance: 117.62 km

# Histogram of the distances
plt.figure(figsize=(10, 6))
plt.hist(all_distances, bins=30, color='#24918d', alpha=0.7)
plt.title('Distribution of Distances Between Cities and Producers')
plt.xlabel('Distance in km')
plt.ylabel('Frequency')
plt.grid(True)
plt.show()

We see that the correlation is non-existent between the total sales and the distance to the producer. This suggests that the proximity to the producer does not significantly influence the sales of Lait Equitable products in the analyzed locations.

Correlation between Total Sales 2022 and Min Distance to Producer:	Correlation between Total Sales 2023 and Min Distance to Producer:
1.000000	0.014807
0.014807	1.000000

Click to show code

df_sales = r.get('df_merged_sales')
#rename 'Location' column as 'City'
df_sales.rename(columns={'Location': 'City'}, inplace=True)
df_sales.set_index('City', inplace=True)

# Calculate the minimum distance for each city and add it to df_sales
df_sales['Min Distance to Producer'] = distance_matrix.min(axis=1)

# Calculate Pearson correlation
correlation_2022 = df_sales[['Total Sales 2022', 'Min Distance to Producer']].corr(method='pearson')
correlation_2023 = df_sales[['Total Sales 2023', 'Min Distance to Producer']].corr(method='pearson')

print("Correlation between Total Sales 2022 and Min Distance to Producer:")
#> Correlation between Total Sales 2022 and Min Distance to Producer:
print(correlation_2022)
#>                           Total Sales 2022  Min Distance to Producer
#> Total Sales 2022                  1.000000                  0.014739
#> Min Distance to Producer          0.014739                  1.000000

print("Correlation between Total Sales 2023 and Min Distance to Producer:")
#> Correlation between Total Sales 2023 and Min Distance to Producer:
print(correlation_2023)
#>                           Total Sales 2023  Min Distance to Producer
#> Total Sales 2023                  1.000000                  0.033715
#> Min Distance to Producer          0.033715                  1.000000

# Convert 'Total Sales 2022' and 'Min Distance to Producer' to numeric types explicitly
df_sales['Total Sales 2022'] = pd.to_numeric(df_sales['Total Sales 2022'], errors='coerce')
df_sales['Min Distance to Producer'] = pd.to_numeric(df_sales['Min Distance to Producer'], errors='coerce')


# Plotting Total Sales 2022 vs. Min Distance to Producer
plt.figure(figsize=(10, 6))
sns.regplot(
    x='Min Distance to Producer', 
    y='Total Sales 2022', 
    data=df_sales,
    scatter_kws={'s': 50, 'color': '#7e57c2'},  # Customizing the scatter plot points
    line_kws={'color': '#33848D', 'lw': 2}  # Customizing the regression line
)
plt.title('Total Sales 2022 vs. Minimum Distance to Producer')
plt.xlabel('Minimum Distance to Producer (km)')
plt.ylabel('Total Sales 2022')
plt.grid(True)
plt.show()

Click to show code


# Plot for 2023
plt.figure(figsize=(10, 6))
sns.regplot(x='Min Distance to Producer', y='Total Sales 2023', data=df_sales, 
            scatter_kws={'s': 50, 'color': '#33848D'}, line_kws={'color': '#7e57c2'})
plt.title('Total Sales 2023 vs. Min Distance to Producer')
plt.xlabel('Minimum Distance to Producer (km)')
plt.ylabel('Total Sales 2023')
plt.grid(True)
plt.show()

4.2.2.4 Diving Deeper

To dive a bit deeper into the insights on how the sales of each manor is influenced, we’ll write a Python script using the pandas library. The goal is to iterate through each city (column in the distance_matrix DataFrame), find the producer (row) with the minimum distance for that city, and then tally the number of times each producer is the closest to any city. Finally, we’ll create a DataFrame to display the number of times each producer was closest to a city.

Here is a step-by-step guide and the corresponding code:

Import the pandas library. Load the data into a DataFrame: We’ll assume the data you provided is in a CSV or Excel file. If it’s in another format, you can adjust the loading method accordingly. Initialize a DataFrame to keep track of the scores for each producer. Iterate through each column (city) in the distance_matrix_t DataFrame, find the index of the minimum distance, and update the score for the respective producer. Display the final DataFrame with the scores.

Click to show code

# Initialize a DataFrame to store scores for each producer
scores = pd.DataFrame(0, index=distance_matrix.index, columns=['Score'])

# Iterate through each city (column) to find the producer with the minimum distance
for city in distance_matrix.columns:
    min_distance_producer = distance_matrix[city].idxmin()
    scores.loc[min_distance_producer, 'Score'] += 1

#merges scores with df_sales
df = df_sales.merge(scores, left_index=True, right_index=True, how='left')
correlation_matrix = df[['Total Sales 2022', 'Total Sales 2023', 'Score']].corr()

# Display the correlation matrix
correlation_matrix
#>                   Total Sales 2022  Total Sales 2023     Score
#> Total Sales 2022          1.000000          0.994222  0.250883
#> Total Sales 2023          0.994222          1.000000  0.268046
#> Score                     0.250883          0.268046  1.000000

Here is the correlation matrix for the total sales in 2022, total sales in 2023, and the score calculated based on the proximity of each producer to the cities

	Total Sales 2022	Total Sales 2023	Score
Total Sales 2022	1.000000	0.994222	0.250883
Total Sales 2023	0.994222	1.000000	0.268046
Score	0.250883	0.268046	1.000000

It is better than the minimum distance correlation, but it is still not very high. This suggests that the proximity of a producer to a city, as measured by the score, has a moderate positive correlation with the total sales in both 2022 and 2023.

Click to show code

# Plotting 'Sales 2022' vs 'Score'
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)  # 1 row, 2 columns, 1st subplot
sns.regplot(x='Total Sales 2022', y='Score', data=df, ci=None, scatter_kws={'color': '#7e57c2'}, line_kws={'color': '#33848D'})
plt.title('Sales 2022 vs. Score')

# Plotting 'Sales 2023' vs 'Score'
plt.subplot(1, 2, 2)  # 1 row, 2 columns, 2nd subplot
sns.regplot(x='Total Sales 2023', y='Score', data=df, ci=None, scatter_kws={'color': '#33848D'}, line_kws={'color': '#7e57c2'})
plt.title('Sales 2023 vs. Score')

# Show the plots
plt.tight_layout()
plt.show()

5 Conclusion

In our detailed exploration of the Lait Equitable dataset, which focuses on the ins and outs of fair trade milk production in Switzerland, we’ve uncovered some interesting insights and important considerations for both sustainable farming and fair payment methods. Despite these findings, it’s important to note that the small size of our dataset has somewhat limited the strength of our conclusions

5.1 Key Findings

5.1.1 Price Trends

Both organic and conventional milk exhibit distinct pricing trends, with organic milk consistently commanding higher prices. This underscores the market’s value perception of organic products.

Forecasting Models: The SARIMA and Holt-Winters models provided robust future price predictions, revealing a stable to slightly increasing trend for organic milk prices, while conventional milk prices are expected to increase less
Sales Analysis: The distribution analysis across different Manor store locations indicated strong variability in sales, highlighting areas with scope for strategic marketing and operational enhancements.
Pareto Analysis: Applying the Pareto Principle revealed that approximately 33.3% of the locations accounted for nearly 80% of the total sales. This insight suggests a significant concentration of sales among a subset of stores, which is critical for targeted marketing and resource allocation strategies.

5.1.2 Implications

Organic Milk Promotion: The stable pricing forecast for organic milk could encourage more producers to adopt organic farming, supporting environmental sustainability and potentially enhancing income due to higher price points.
Tailored Marketing Strategies: The variability in sales across locations suggests that Lait Equitable could potentially benefit from a more tailored regional marketing strategy, possibly adapting product offerings based on regional preferences and sales performance.
Focused Resource Allocation: Insights from the Pareto analysis allow for optimized resource allocation, focusing efforts on high-performing locations to maximize returns and implementing targeted improvements in underperforming areas or drop them could be a strategic move to consider.

5.2 Limitations and Future Research

Significant Data Limitations: A major limitation of our study is the small size of the dataset, which made extracting statistically significant results almost a mission impossible. This constraint has potentially skewed the accuracy of our findings and forecasting models, leading to less confidence in the predictability of future trends based on this data alone.
Data Scope: Additionally, the focus on relatively short-term data for forecasting could be expanded in future studies to refine predictions and adjust for market shocks.
Manor Sales Dataset: Further research could delve deeper into a Manor sales dataset which could be highly beneficial in order to compare our products with other product in the Manor store. This would provide a more comprehensive understanding of the market dynamics and competitive landscape. Unfortunately, this data was not available for our analysis.
Sustainability Metrics: Future research could also explore incorporating sustainability metrics into the analysis, such as carbon footprint or water usage, to provide a more holistic view of the environmental impact of different milk production methods.

5.3 Take-Home Message

Lait Equitable stands as an interesting initiative in promoting fair compensation in the dairy industry. Despite the significant challenges posed by the limited dataset, our findings provide a foundation for understanding market dynamics and highlight areas for operational improvement. Continuing to leverage data-driven approaches, through a larger and more comprehensive dataset, will be really insightful for Lait Equitable to enhance its strategies and ensure the sustainability and profitability of Swiss dairy farmers. Future endeavors could also explore expanding product lines or new markets based on more robust data analyses, always with a focus on sustainable and equitable practices.