Search and download data from the Swiss Federal Statistical Office
The BFS
package allows to search and download public
data from the
Swiss
Federal Statistical Office (BFS stands for Bundesamt für
Statistik in German) API in a dynamic and reproducible way.
install.packages("BFS")
You can also install the development version from Github.
::install_github("lgnbhl/BFS") devtools
library(BFS)
Retrieve the list of publicly available datasets from the data
catalog in any language (“de”, “fr”, “it” or “en”) by calling
bfs_get_catalog_data()
.
<- bfs_get_catalog_data(language = "en")
catalog_data_en
catalog_data_en
## # A tibble: 180 × 7
## title language publication_date url_bfs url_px guid catalog_date
## <chr> <chr> <dttm> <chr> <chr> <chr> <dttm>
## 1 Provis… en 2023-04-04 08:30:00 https:… https… bfsR… 2023-04-04 08:30:00
## 2 Perman… en 2022-10-06 08:30:00 https:… https… bfsR… 2023-04-04 08:30:00
## 3 Privat… en 2022-10-06 08:30:00 https:… https… bfsR… 2023-04-04 08:30:00
## 4 Deaths… en 2022-09-26 08:30:00 https:… https… bfsR… 2023-04-04 08:30:00
## 5 Divorc… en 2022-09-26 08:30:00 https:… https… bfsR… 2023-04-04 08:30:00
## 6 Live b… en 2022-09-26 08:30:00 https:… https… bfsR… 2023-04-04 08:30:00
## 7 Marria… en 2022-09-26 08:30:00 https:… https… bfsR… 2023-04-04 08:30:00
## 8 Acquis… en 2022-08-25 08:30:00 https:… https… bfsR… 2023-04-04 08:30:00
## 9 Acquis… en 2022-08-25 08:30:00 https:… https… bfsR… 2023-04-04 08:30:00
## 10 Demogr… en 2022-08-25 08:30:00 https:… https… bfsR… 2023-04-04 08:30:00
## # ℹ 170 more rows
English and Italian data catalogs offer a limited list of datasets. For the full list please get the French (“fr”) or German (“de”) data catalogs.
The function bfs_get_data()
allows you to download any
dataset for the data catalog.
To download a specific dataset, you can either use the
url_bfs
or the number_bfs
.
The url_bfs
argument refers to the offical webpage of a
dataset. Find below an example getting the url_bfs
for a
specific dataset using the R package “dplyr”.
library(dplyr)
<- catalog_data_en |>
url_bfs_uni_students ::filter(title == "University students by year, ISCED field, sex and level of study") |>
dplyr::pull(url_bfs)
dplyr
url_bfs_uni_students
## [1] "https://www.bfs.admin.ch/content/bfs/en/home/statistiken/kataloge-datenbanken/daten.assetdetail.24367729.html"
Then you can download the full dataset using
url_bfs
.
# https://www.bfs.admin.ch/content/bfs/en/home/statistiken/kataloge-datenbanken/daten.assetdetail.16324907.html
<- bfs_get_data(url_bfs = url_bfs_uni_students, language = "en") df_uni
## Downloading large query (in 4 batches):
## | | | 0% | |================== | 25% | |=================================== | 50% | |==================================================== | 75% | |======================================================================| 100%
df_uni
## # A tibble: 18,060 × 5
## Year `ISCED Field` Sex `Level of study` `University students`
## <chr> <chr> <chr> <chr> <dbl>
## 1 1980/81 Education science Male First university degr… 545
## 2 1980/81 Education science Male Bachelor 0
## 3 1980/81 Education science Male Master 0
## 4 1980/81 Education science Male Doctorate 93
## 5 1980/81 Education science Male Further education, ad… 13
## 6 1980/81 Education science Female First university degr… 946
## 7 1980/81 Education science Female Bachelor 0
## 8 1980/81 Education science Female Master 0
## 9 1980/81 Education science Female Doctorate 70
## 10 1980/81 Education science Female Further education, ad… 52
## # ℹ 18,050 more rows
It is recommended to privilege the use of the number_bfs
argument for stability and reproducibility.
You can manually find the number_bfs
by opening the
official webpage and looking for the “FSO number”, as showed below.
# open Uni students dataset webpage
browseURL(url_bfs_uni_students)
Then you can download the dataset using number_bfs
.
bfs_get_data(number_bfs = "px-x-1502040100_131", language = "en")
You can also access additional information about the dataset by
running bfs_get_data_comments()
.
bfs_get_data_comments(number_bfs = "px-x-1502040100_131", language = "en")
## Downloading large query (in 4 batches):
## | | | 0% | |================== | 25% | |=================================== | 50% | |==================================================== | 75% | |======================================================================| 100%
## # A tibble: 1 × 4
## row_no col_no comment_type comment
## <int> <int> <chr> <chr>
## 1 NA 4 column_comment "To ensure that the presentations from cubes con…
You may get an error message if the query is too large.
Error:
Too large query.
The smallest batch size is 24030 and the maximum number of values
that can be downloaded through the API is 5000.
You may also have an error if the API calls too many requests.
Error in pxweb_advanced_get(url = url, query = query, verbose = verbose) :
Too Many Requests (RFC 6585) (HTTP 429).
One solution is to query only specific elements of the dataset to download less data. Here an example.
First you want to get the variable names, i.e. code
, and
categories, i.e. values
, of your dataset.
# choose a BFS number and language
<- bfs_get_metadata(number_bfs = "px-x-1502040100_131", language = "en")
metadata
str(metadata)
## tibble [4 × 7] (S3: tbl_df/tbl/data.frame)
## $ code : chr [1:4] "Jahr" "ISCED Fach" "Geschlecht" "Studienstufe"
## $ text : chr [1:4] "Year" "ISCED Field" "Sex" "Level of study"
## $ values :List of 4
## ..$ : chr [1:43] "0" "1" "2" "3" ...
## ..$ : chr [1:42] "0" "1" "2" "3" ...
## ..$ : chr [1:2] "0" "1"
## ..$ : chr [1:5] "0" "1" "2" "3" ...
## $ valueTexts :List of 4
## ..$ : chr [1:43] "1980/81" "1981/82" "1982/83" "1983/84" ...
## ..$ : chr [1:42] "Education science" "Teacher training without subject specialisation" "Teacher training with subject specialisation" "Fine arts" ...
## ..$ : chr [1:2] "Male" "Female"
## ..$ : chr [1:5] "First university degree or diploma" "Bachelor" "Master" "Doctorate" ...
## $ time : logi [1:4] TRUE NA NA NA
## $ elimination: logi [1:4] NA TRUE TRUE TRUE
## $ title : chr [1:4] "University students by Year, ISCED Field, Sex and Level of study" "University students by Year, ISCED Field, Sex and Level of study" "University students by Year, ISCED Field, Sex and Level of study" "University students by Year, ISCED Field, Sex and Level of study"
Then you can manually select the dimensions of the dataset you want to query.
# Manually create BFS query dimensions
# Use `code` and `values` elements
# Use "*" to select all
<- list(
dimensions "Jahr" = c("40", "41"),
"ISCED Fach" = c("0"),
"Geschlecht" = c("*"),
"Studienstufe" = c("2", "3"))
# Query BFS data with specific dimensions
::bfs_get_data(
BFSnumber_bfs = "px-x-1502040100_131",
language = "en",
query = dimensions
)
## # A tibble: 8 × 5
## Year `ISCED Field` Sex `Level of study` `University students`
## <chr> <chr> <chr> <chr> <dbl>
## 1 2020/21 Education science Male Master 151
## 2 2020/21 Education science Male Doctorate 121
## 3 2020/21 Education science Female Master 555
## 4 2020/21 Education science Female Doctorate 306
## 5 2021/22 Education science Male Master 143
## 6 2021/22 Education science Male Doctorate 115
## 7 2021/22 Education science Female Master 599
## 8 2021/22 Education science Female Doctorate 318
A lot of tables are not accessible through the official API, but they
are still present in the official
BFS website. You can search for specific tables using
bfs_get_catalog_tables()
.
<- bfs_get_catalog_tables(language = "en", title = "students")
catalog_tables_en_students
catalog_tables_en_students
## # A tibble: 5 × 7
## title language publication_date url_bfs url_table guid catalog_date
## <chr> <chr> <dttm> <chr> <chr> <chr> <dttm>
## 1 Stud… en 2023-04-05 00:00:00 https:… https://… bfsR… 2023-04-05 00:00:00
## 2 Stud… en 2023-04-05 00:00:00 https:… https://… bfsR… 2023-04-05 00:00:00
## 3 Stud… en 2023-03-28 08:30:00 https:… https://… bfsR… 2023-04-05 00:00:00
## 4 Stud… en 2023-03-28 08:30:00 https:… https://… bfsR… 2023-04-05 00:00:00
## 5 Stud… en 2023-03-28 08:30:00 https:… https://… bfsR… 2023-04-05 00:00:00
Most of the BFS tables are Excel or CSV files. For example, you can
use the “openxlsx” R package to read a specific Excel table using the
url_table
column.
library(dplyr)
library(openxlsx)
<- catalog_tables_en_students |>
tables_bfs_uni_students ::slice(3) |>
dplyr::pull(url_table)
dplyr
<- tryCatch(expr = openxlsx::read.xlsx(tables_bfs_uni_students, startRow = 1),
df_table error = function(e) "Failed reading table") |>
::as_tibble()
dplyr
df_table
## # A tibble: 6 × 2
## Students.at.Universities.and.Institutes.of.Technology.(UIT).2022/23:.S…¹ X2
## <chr> <chr>
## 1 "Definitions " Defi…
## 2 "Tab 1" Entr…
## 3 "Tab 2" Stud…
## 4 "Tab 3" Stud…
## 5 "Tab 4" Stud…
## 6 "Tab 5" Fore…
## # ℹ abbreviated name:
## # ¹`Students.at.Universities.and.Institutes.of.Technology.(UIT).2022/23:.Standard.Tables`
Under the hood, this package is using the pxweb package to query the Swiss Federal Statistical Office PXWEB API. PXWEB is an API structure developed by Statistics Sweden and other national statistical institutions (NSI) to disseminate public statistics in a structured way.
You can clean the column names of the datasets automatically using
janitor::clean_names()
by adding the argument
clean_names = TRUE
in the bfs_get_data()
function.
This package is in no way officially related to or endorsed by the Swiss Federal Statistical Office (BFS).
Any contribution is strongly appreciated. Feel free to report a bug, ask any question or make a pull request for any remaining issue.