Pekerja VS Pengangguran

DI INDONESIA

This project contains data analysis from several datasets on unemployment and workers in Indonesia. Some of the data used in this analysis may have biases related to technical issues and the possibility of political interference behind it. Therefore, please understand that the results of this analysis are likely to be inaccurate, as I am aware that assumptions built upon other assumptions will lead to conclusions that deviate further from actual reality.

However, if you are willing to read this project further, let us continue. I sincerely thank you for taking the time to read this simple project.

The data to be used in this project contains many biases for several reasons:

Some of the data provided by BPS are backcast results. Backcasting itself is a method that only produces estimated values rather than actual data.
Even if the data is not a result of backcasting, it may still be inaccurate due to several technical issues. As is widely known and considered an open secret in Indonesia, survey respondents sometimes provide inaccurate information to influence the government into granting them aid.
One example of manipulation by some individuals is reporting that a family member is unemployed and still young. This is often done to qualify for financial assistance known as the "Prakerja" program.

Now that you are aware of these biases and limitations. With these considerations in mind, let's move on to the project and examine the data in more detail.

The purpose of this data analysis is to understand the movement of unemployment and employment rates.

So the main question is:
Is the number of workers in Indonesia increasing (which would be a good sign), or is it decreasing (which would be very bad)?

data -

After exploring the BPS (Badan Pusat Statistik) website, I obtained 12 types of data to support our analysis this time:

Employment and unemployment data.
Open unemployment data - by region.
Open unemployment data - by gender.
Open unemployment data - by education level.
Open unemployment data - by province.
Open unemployment data - by age group.
Underemployment data - by region.
Underemployment data - by gender.
Underemployment data - by education level.
Underemployment data - by province.
Underemployment data - by age group.
Core data - job vacancy fulfillment.

Employment and unemployment data -

This data contain data about number of employment and unemployment in Indonesia.

Categorize / status : employment, unemployment.
Year.
Time / month : February, Agust, Yearly.
Percentage.
Number (thousand).

Open unemployment data - by region

This data contain data about number of people that are not working at all and looking for a job - filtered by region they live in.

Categorize / status : city, village / town.
Year.
Percentage.

Open unemployment data - by gender

This data contain data about number of people that are not working at all and looking for a job - filtered by gender.

Categorize / status : male, female.
Year.
Percentage.

Open unemployment data - by education level

This data contain data about number of people that are not working at all and looking for a job - education level.

Categorize / status :
- Not attending school / not yet graduated, not graduated and graduated from elementary school.
- Junior high school.
- General high school.
- Vocational high school.
- Diploma 1/2/3.
- University.
Year.
Percentage.

Open unemployment data - by province

This data contain data about number of people that are not working at all and looking for a job - filtered by province.

DATA CLEANING -

In this process, what is usually called data cleaning is more accurately referred to as data reorganization. This is because the data obtained from BPS is already clean. However, its format or structure may not be optimal for analysis, so some adjustments are necessary.
I will not explain each file one by one or detail every step of the reorganization process. Instead, I will outline a series of steps I take when encountering specific conditions in the data.

THE KEY STEPS ARE :

Remove data that is irrelevant to the information provided by the dataset

Select the data range, import it into Power Query, and apply transformations based on the predefined considerations as follows:

>>

If the data is in the form of a pivot table, I will unpivot it.

rOTATE to unpivot

If unpivoting the data results in empty steps, I will fill them downward.

<<

If any headers become empty after restructuring, I will add appropriate labels.

Data modeling -

The data that i use for this analysis is comes from multiple seperate file. As an example : Employment and unemployment data is comes in 6 files. Each file contain about Employment and unemployment number thats wriiten in certain year. There is 6 file, means comes from 6 year. From 2018, 2019, 2020, 2021, 2022, and 2023.

Before proceeding to the data visualization step, it is best to first combine data from multiple files into their respective categories based on their relationships.

If a data category is better suited for the append rows method, then all files should be combined using this approach. Conversely, if a data category is better suited for the merge method, then merging should be applied accordingly.

see the data

Open unemployment data - by age

This data contain data about number of people that are not working at all and looking for a job - filtered by age.

Categorize / status : age scale.
Year.

Underemployment data - by region

This data contain data about number of people that are not working at full time or using their full skill - filtered by region.

UnderEmployment data - by gender

This data contain data about number of people that are not working at full time or using their full skill - filtered by gender.

Categorize / status : male, female.
Year.
Percentage.

UNDERemployment data - by education level

This data contain data about number of people that are not working at full time or using their full skill - filtered by education level.

Categorize / status :
- Not attending school / not yet graduated, not graduated and graduated from elementary school.
- Junior high school.
- General high school.
- Vocational high school.
- Diploma 1/2/3.
- University.
Year.
Percentage.

Underemployment data - by province

This data contain data about number of people that are not working at full time or using their full skill - filtered by province.

Categorize / status : province.
February
Agust

Categorize / status : province.
February
Agust

Underemployment data - by age

This data contain data about number of people that are not working at full time or using their full skill - filtered by age.

Categorize / status : age scale.
Year.

Clean Data

Categorize / status : city, village / town.
Year.
Percentage.

Regardless of the type or category of data being combined, the key point is that the data used in this analysis is time series data—whether it is monthly or yearly. The main focus of this analysis is to observe changes in a variable's values over a specific time interval.
Since the data to be analyzed consists of changes in a variable’s value over a specific time interval, the data merging method should also depend on how the data is presented over time. For example:

And just as I did in the data cleaning process, I will not explain each step in detail for every file. I believe the explanation above is sufficient to describe the steps I will take based on the situations mentioned and the considerations outlined. However, I will still present the final results.

*I will use Power Query for all dataset merging processes. I will also remove all annual rows from any dataset because their content is "-", and I am unsure how to fill them appropriately. If I were to fill them with the average of data from February and August, I believe that would be unfair, considering that the average would be based on only two samples, while the total population in this context represents 12 months. Therefore, I have decided to remove the annual rows entirely

In the "by region" dataset, the time intervals are arranged across multiple columns (horizontally), such as Column 2018, Column 2019, and so on. Therefore, the most suitable merging method is merge, using the matching rows of the region column as the key.

In the second example, the "Employment and unemployment" dataset is structured vertically, with a column that records the year for each row. This means the time intervals are arranged vertically. Therefore, the recommended merging method is Append, combining rows from other files accordingly.

However, in the main dataset, the case is slightly different. This dataset does not contain columns with time interval data arranged in rows, nor does it have a column explicitly indicating the time interval for each row. Therefore, an additional preprocessing step is required.

First, add a Year column (since each file represents a specific year). For example, in the 2018 file, fill the Year column with 2018 for all rows, and do the same for other files accordingly.

Once this step is completed, the data can then be merged using the Append method.

*In case of create a visualization based on numerical trends over time, all data will be transformed into a vertical time-series format. This is done to facilitate the creation of visualizations that focus on time-based data progression.
Except for the provincial underemployment data, as I plan to use a slightly different method for that, of course.

ASKING -

In general data analysis, or more precisely, typically, the process of asking the right questions is placed at the beginning of the data analysis workflow. However, in our project this time, the questioning phase comes after the data has been cleaned. Why? Because the data source in this project is relatively incomplete. Therefore, I have determined that the process used in this data analysis will be slightly different.

In this case, the process of "asking questions" in data analysis is better approached by posing the core questions first, followed by technical and detailed questions later, after all the data has been cleaned and gathered.

With this workflow, I believe the core questions are answered first, while technical and detailed questions can follow later. I think this approach is reasonable and well-accepted.

The core question is: "Has the trend of unemployment rates in Indonesia over time been getting better (meaning the unemployment rate is decreasing) or getting worse (meaning it is increasing)?"

Considering the main question, which aims to identify data trends, and the data obtained from BSI, the following questions are worth considering:

Is job fulfillment in Indonesia effective?
The development of unemployment rates vs. workers over time.
Over time, which gender has more unemployed individuals, male or female?
Over time, is unemployment rising more in rural areas or increasing more in urban areas?
Is the development of unemployment data always relevant to a person's level of education?
Do young workers contribute more to the unemployment rate, or is it the older ones?
Are the top provinces with the highest unemployment rates always occupied by the same or similar provinces, or do they vary over time?

Answering with data visualization -

Is job fulfillment in Indonesia effective?

Tableau will be chosen for the data visualization part of this project.

Is the fulfillment of labor for available job opportunities effective in Indonesia? To answer this question, the main dataset will be used, as it contains relevant information to provide an answer. Columns such as job seekers, job vacancies, and labor fulfillment are clearly present in this dataset.

There are three main categories in this dataset: job seekers, job vacancies, and labor fulfillment. I did not receive accurate information from the data author regarding the exact meaning of labor fulfillment in this dataset. However, if it is referred to as job placement/labor fulfillment, it usually pertains to job vacancies or positions that have been successfully filled. Therefore, this will serve as the primary benchmark in the analysis of this data.

*Important Note: It is crucial to exclude the "Indonesia" row from the visualization since this data visualization will focus on provincial blocks rather than Indonesia as a whole. For now, the visualization will be more concentrated on each individual province.

The main dataset consists of 2 pathways and 3 classes. What I mean by "2 pathways" are the routes that will be used in the data analysis for this dataset.

For this main dataset, analysis can be conducted if the data is examined through one or both of the available pathways. The first is through the recorded region (in this dataset, the province), and the second is through the time sequence.

Meanwhile, the 3 classes refer to male, female, and total. The total class itself contains the sum of the male and female data. The total class is necessary to assess how much the difference between male and female numbers influences the overall result.

Since there are two available pathways, I want to start with the recorded region pathway (province). After that, I will proceed with the time series pathway.

Based on the data presented in the visualization above, it is clear that something concerning is happening. Regardless of the class, the number of job seekers is significantly higher than the number of available job openings. This is followed by a slightly lower number of job placements, which, while closer to the number of available job openings, still does not reach the required level.

From this visualization, we can conclude that the number of job seekers in Indonesia is extremely high. However, this high number is not matched by an adequate number of job opportunities. Even when job openings are available, not all of them fully absorb the labor force.

If this situation continues, the unemployment rate in Indonesia will continue to rise.

After analyzing the data on total, female, and male categories in terms of job openings, job seekers, and job placements by province, let’s continue with the analysis of job openings, job seekers, and job placements for males, females, and the total, also based on province. This will help identify which category plays a dominant role in shaping job openings, job seekers, and job placements data.

Based on the visualization above, it can be inferred that whether in job openings, job placements, or job seekers data, the majority (in almost all provinces) is consistently dominated by males, while females follow behind.

Based on the visualization above, it can indeed be concluded that the majority of the data is dominated by males. However, these figures only provide an overview without detailed insights. Therefore, in the next visualization, a deeper analysis will be conducted to determine the exact percentage of male dominance and female subordination in shaping the data.

Additionally, the next visualization will incorporate a time series parameter to observe the development of gender dominance over the years. This will help determine whether males consistently dominate the data significantly or if, in certain years, their dominance weakens.

In the job seeker category, males slightly dominated the data in 2018. However, in 2019, females overtook the dominance, though only by a small margin. In contrast, in 2020 and 2023, males dominated with a noticeably wider gap, unlike the close competition between males and females in 2018 and 2019.

Job Openings Data – From 2018 to 2019, females briefly dominated with a noticeable gap compared to males. However, in the 2020 and 2023 job openings data, the visualization shows a significant shift, with males taking the lead by a substantial margin. In those two years, male dominance even exceeded 60 percent. Personally, I consider that a strong enough figure to indicate a drastic difference.

Job Placement Data – Its pattern is not much different from the job openings data. From 2018 to 2019, the data was dominated by females, while from 2020 to 2023, males took the lead. The only noticeable difference is that in 2018 and 2019, female dominance in job placements was slightly higher than in job openings during the same years.

In job openings data for 2018 and 2019, female dominance ranged between 56% and 53%. Meanwhile, in the same years, job placement data showed an even higher female dominance, ranging between 54% and 58%.

Is job fulfillment in Indonesia effective? The answer is clearly NO. Even from the first visualization, it is evident that the number of job seekers is FAR HIGHER than the available job openings. On top of that, the available job positions are not even fully occupied.

Knowing that males dominate the data is just a bonus insight because, at most, the gap reaches only around 60% higher than females. This means there is no sampling bias where one side is disproportionately represented in the data.

But let’s return to the main question: Is job fulfillment in Indonesia effective? Considering that the number of job seekers is overwhelmingly high compared to the available job openings, and that not all job positions can absorb the workforce to their full capacity, my final answer is a definite NO!

The development of unemployment rates vs. workers over time.

Actually, it would be inaccurate to consider this a question because the phrase "The development of unemployment rates vs. workers over time." seems more appropriate as a task description. Rather than a question, it is better described as an analytical assignment that involves examining the data.
However, regardless of what it is classified as, one thing is clear—the data from the Employment and unemployment dataset. is essential for carrying out this task.

Based on the pie chart generated from the data, it is evident that the ratio of employed individuals to unemployed individuals has remained relatively consistent from year to year. Even when differences do exist, the gap is only around 1%. This indicates that there has been no significant change in this disparity over the years.

Over time, which gender has more unemployed individuals, male or female?

Over time, which gender has more unemployed individuals, male or female? - For this question and future questions, two datasets will be used: one representing full unemployment and the other representing underemployment. This approach is necessary because the collected data comes from these two different categories.

However, before proceeding with the visualization, it is advisable to transform any time-sequenced data that is structured horizontally like this.

Into a vertically structured time-series format with some column name adjustments to ensure consistency, like this.

This transformation is necessary to simplify the process of comparing the development of multiple values over a specific time series. The type of visualization used for comparison will be created using the dual-axis method.

Two visualizations represent the two datasets: full unemployment and underemployment.

In the visualization of underemployment trends over the years, the values remained relatively stable from 2018 to 2019, with only a slight decline in 2019. However, in 2020, there was a significant spike. After that, the values gradually decreased until 2022, followed by a slight rebound in 2023.

The full unemployment visualization follows a similar pattern to the underemployment visualization. However, the key difference lies in the 2022 to 2023 values.

In the underemployment visualization, there was a slight rebound in 2023. Meanwhile, in the full unemployment visualization, the decline continued smoothly without any reversal.

Based on the visualization, the unemployment trends from 2018 to 2023 are quite similar between full unemployment and underemployment. However, two key points stand out: the years 2020 and 2023.

Both categories—regardless of how you define them—experienced a significant spike in 2020. This could possibly be influenced by the population census conducted by the government that year, considering that data from other years may have been modeled based on previous census results. But who knows? That’s just a guess, as government-related technicalities are best understood by the authorities themselves. Not to mention how difficult it can be to find structured data in Indonesia due to its somewhat unorganized data collection system.

Regardless of the cause, for now, let’s assume that this dataset is reliable enough for the purpose of this analysis.

Returning to 2023, the underemployment data shows a slight reversal in the downward trend observed from 2020 to 2022. Meanwhile, the full unemployment data continues to decline gradually through 2023.

At first glance, this might seem like a positive development—fewer people are fully unemployed, while underemployment is slightly rising.

"That’s a good thing, right? Isn’t being underemployed better than being completely unemployed?"

WRONG.

It all depends on what kind of underemployment we’re talking about.

If underemployment refers to people working part-time, then yes, it’s better than being completely unemployed.

However, if underemployment means individuals working in jobs that do not utilize their skills, abilities, or potential, then it becomes a completely different issue—one that could signal a deeper problem in the job market.

Nevertheless, regardless of how much the numbers increase or decrease at certain points, men mostly consistent remain at the top, while women follow below. However, the gap between them is not too significant.

Over time, is unemployment rising more in rural areas or increasing more in urban areas?

From the visualization, it appears that the development values of each region have a 180-degree difference in the categories of full unemployment and partial unemployment.

Regarding the rise and fall of the lines, the lines from urban areas, rural areas, or a combination of both have more or less the same movement direction and curvature. When the city's position rises, other areas also rise, both in the full unemployment and partial unemployment categories.

However, the main difference lies in which line is at the top in the visualization for each category.

In the full unemployment category, the top line is occupied by urban areas, followed by urban + rural areas, with rural areas at the lowest position. This is the complete opposite in the partial unemployment category. In the partial unemployment category, the top position is actually occupied by rural areas, followed by rural + urban areas, while urban areas are at the lowest position.

This shows that the situation in urban areas is indeed more chaotic compared to rural areas. It is clearly evident that full unemployment is highest in urban areas, while partial unemployment is actually highest in rural areas. Regardless of the form, being fully unemployed is far worse than being partially unemployed.

Is the development of unemployment data always relevant to a person's level of education?

Is the level of education relevant to the unemployment rate?

— Before presenting the visualization, I need to clarify the color gradient used in it. I have chosen a gradient from very dark blue (deep) to very light blue (bright). The darker the color, the higher the level of education. Conversely, the lighter the color, the lower the level of education.

For example, the lightest blue represents the line for the lowest education level, such as those who did not complete elementary school, completed elementary school, never attended school, or have never been to school. Meanwhile, the darkest blue represents the line for the highest education level, which is university education.

I'm quite concerned after seeing that visualization. In the unemployment visualization, vocational high schools (SMK) consistently rank at the top above all others.

The primary goal of vocational high schools (SMK) is to produce graduates who are ready to enter the workforce immediately after graduation. However, the visualization clearly shows that they consistently top the chart among other lines. This clearly proves that the mission of SMKs to create job-ready workers has largely failed, considering the high rate of full unemployment is predominantly represented by this education level.

What's even more concerning about this full unemployment category is who occupies the lowest position in the visualization. Somehow, the lowest position is consistently held by those with the least education. If this trend continues, the effects could be extremely dangerous.

What I fear the most is that people, especially students, will lose faith in the value of higher education. If those with no schooling, only elementary school diplomas, or even those who didn't complete elementary school have the lowest unemployment rates, it could lead to public distrust in educational institutions and the fairness of job selection in the workforce.

Why do I believe job selection is unfair after seeing this visualization? The reasoning is simple—if we think logically, having a higher education should provide more opportunities and choices when it comes to securing a job.

Underemployment – a phrase that can be defined in multiple ways. Depending on which definition is adopted, the interpretation of the visualization category of underemployment will also vary accordingly.

If underemployment is defined as those working in fields that do not align with their skills and expertise, then the visualization of this category of underemployment can be interpreted as sending a rather positive signal. This is because those occupying the lowest position, even lower than other lines, are individuals with the highest levels of education—diploma and university graduates.

Diploma or university graduates receive higher vocational education. Those who graduate from these institutions have their skills honed in their respective fields much more deeply than those from vocational high schools. Therefore, if individuals who have studied at higher vocational institutions have a very low underemployment rate (meaning they work in jobs that do not match their skills), it can be interpreted that many graduates at this level of education are working in fields that align with the expertise they acquired during their studies.

Let’s just hope that underemployment in this data truly means that. Considering that the data does not provide any clarification on what kind of underemployment is being referred to.

Do young workers contribute more to the unemployment rate, or is it the older ones?

That's quite bad. The top position is always occupied by those in their most productive ages. What does it mean that the productive age group is actually the top scorer in unemployment? That’s clearly a nightmare. Younger individuals are highly valuable in the economy because they still have a lot of energy compared to older workers. Even though they may lack experience, having more young people employed would, in turn, increase the overall experience of the youth.

However, this visualization instead shows that the ones leading the unemployment rate—both in full unemployment and underemployment—are young people. This data visualization proves that the job market in Indonesia has failed to utilize a workforce that is actually abundantly available. That’s very concerning.

Are the top provinces with the highest unemployment rates always occupied by the same or similar provinces, or do they vary over time?

This is quite complicated considering the data structure in the provincial CSV file. The table in the file contains four columns: province, February, August, and year.

The task is to retrieve the top provinces, but the main issue arises when sorting the data to display the top 5 provinces. The problem is that even though we only want to see the top 5, provinces from the same year that are not in the top ranking will also be displayed. Therefore, I decided to make some modifications to the data before importing it into Tableau for visualization.

The plan is to use pandas since it is much more flexible for performing custom operations.

Load file to a dataFrame

import pandas as pd

dataFramePenuh = pd.read_csv("project1/provinsiUni.csv")

Split the DataFrame into multiple DataFrames based on the year.

penuh2018 = dataFramePenuh[dataFramePenuh["Tahun"] == 2018]
penuh2019 = dataFramePenuh[dataFramePenuh["Tahun"] == 2019]
penuh2020 = dataFramePenuh[dataFramePenuh["Tahun"] == 2020]
penuh2021 = dataFramePenuh[dataFramePenuh["Tahun"] == 2021]
penuh2022 = dataFramePenuh[dataFramePenuh["Tahun"] == 2022]
penuh2023 = dataFramePenuh[dataFramePenuh["Tahun"] == 2023]

Convert the data type of the February and August columns to numeric, then create a new column containing the average of these two columns.

penuh2023['Februari'] = pd.to_numeric(penuh2023['Februari'], errors="coerce")
penuh2023['Agustus'] = pd.to_numeric(penuh2023['Agustus'], errors="coerce")
penuh2023["Rata rata"] = (penuh2023["Februari"] + penuh2023["Agustus"]) / 2

penuh2022['Februari'] = pd.to_numeric(penuh2022['Februari'], errors="coerce")
penuh2022['Agustus'] = pd.to_numeric(penuh2022['Agustus'], errors="coerce")
penuh2022["Rata rata"] = (penuh2022["Februari"] + penuh2022["Agustus"]) / 2

penuh2021['Februari'] = pd.to_numeric(penuh2021['Februari'], errors="coerce")
penuh2021['Agustus'] = pd.to_numeric(penuh2021['Agustus'], errors="coerce")
penuh2021["Rata rata"] = (penuh2021["Februari"] + penuh2021["Agustus"]) / 2

penuh2020['Februari'] = pd.to_numeric(penuh2020['Februari'], errors="coerce")
penuh2020['Agustus'] = pd.to_numeric(penuh2020['Agustus'], errors="coerce")
penuh2020["Rata rata"] = (penuh2020["Februari"] + penuh2020["Agustus"]) / 2

penuh2019['Februari'] = pd.to_numeric(penuh2019['Februari'], errors="coerce")
penuh2019['Agustus'] = pd.to_numeric(penuh2019['Agustus'], errors="coerce")
penuh2019["Rata rata"] = (penuh2019["Februari"] + penuh2019["Agustus"]) / 2

penuh2018['Februari'] = pd.to_numeric(penuh2018['Februari'], errors="coerce")
penuh2018['Agustus'] = pd.to_numeric(penuh2018['Agustus'], errors="coerce")
penuh2018["Rata rata"] = (penuh2018["Februari"] + penuh2018["Agustus"]) / 2

Sort the data in descending order based on the average column.

penuh2023 = penuh2023.sort_values(by="Rata rata", ascending=False)
penuh2023 = penuh2023.reset_index(drop=True)
penuh2023 = penuh2023[:5]

penuh2022 = penuh2022.sort_values(by="Rata rata", ascending=False)
penuh2022 = penuh2022.reset_index(drop=True)
penuh2022 = penuh2022[:5]

penuh2021 = penuh2021.sort_values(by="Rata rata", ascending=False)
penuh2021 = penuh2021.reset_index(drop=True)
penuh2021 = penuh2021[:5]

penuh2020 = penuh2020.sort_values(by="Rata rata", ascending=False)
penuh2020 = penuh2020.reset_index(drop=True)
penuh2020 = penuh2020[:5]

penuh2019 = penuh2019.sort_values(by="Rata rata", ascending=False)
penuh2019 = penuh2019.reset_index(drop=True)
penuh2019 = penuh2019[:5]

penuh2018 = penuh2018.sort_values(by="Februari", ascending=False)
penuh2018 = penuh2018.reset_index(drop=True)
penuh2018 = penuh2018[:5]

Concatenate all DataFrames from each year and then save them in CSV format.

penuh = [penuh2023, penuh2022, penuh2021, penuh2020, penuh2019, penuh2018]

result = pd.concat(penuh, ignore_index=True)

result.to_csv("result.csv", index = False)
print(result)

The approach of using pandas in Python was applied because the provincial full unemployment data had already been converted into a vertical format, just like other datasets used for time-series visualizations. However, the provincial underemployment data was intentionally left in a horizontal format because I wanted to showcase a different method for visualization.

Two Methods for Visualizing the Top 5 Provinces by Year

First Method (Vertical Format - Pandas Approach)
- If the dataset is in a vertical format, pandas can be used to:
  - Split the dataset into multiple parts based on the year.
  - Sort each subset in descending order based on a specific parameter.
  - Extract the top-ranked entries.
  - Concatenate the filtered data and save the result as a CSV file.
Second Method (Horizontal Format - Tableau Approach)
- If the dataset remains in a horizontal format, then:
  - Multiple sheets in Tableau can be created, one for each year.
  - Sort each sheet in descending order.
  - Extract only the top-ranked entries for each sheet.

For the provincial underemployment data, I deliberately kept it in a horizontal format because I wanted to experiment with the second method.

After completing all the data processing for the provincial dataset, here is the visualization result.

From the visualization, it is clear that it is not entirely true that the top 5 positions are always occupied by the same provinces year after year. However, it can be confirmed that, although the composition sometimes changes, some provinces are more likely to consistently appear in the top 5 list.

For example, Banten, West Java, and Riau Islands have never been absent from the top 5 list in any given year. Meanwhile, other provinces in the list appear frequently, while some only show up once.

As I mentioned earlier, the underemployment dataset was intentionally left in its horizontal format and was not subjected to any transformations using pandas, unlike the full unemployment data.

However, after successfully creating the visualization entirely in Tableau, the difference between this visualization and the one for provincial full unemployment becomes clearly noticeable.

For example, listing which provinces were present in certain years and absent in others is not possible because the dataset used is a complete dataset, meaning that all data from the dataset will be displayed.

Moreover, creating the visualization requires more than one sheet, as each sheet is used for only one year's data. Since the technique involves creating multiple sheets, sorting them in descending order, and selecting the top 5 from each sheet, the number of sheets increases, making sheet management in Tableau somewhat more complex.

But regardless of all that, the composition of provinces present in the visualization remains more or less the same. The only difference is that the names of the provinces are different from those in the full unemployment category.

- CONCLUSION -

Before delving further into the results of this data analysis, I want to emphasize once again that the data used in this analysis is not entirely accurate. This is because BPS (Statistics Indonesia) itself acknowledges that some of the data used in this analysis is merely a projection, generated through algorithms designed to estimate the actual values.

Therefore, I hope that readers clearly understand that this data analysis does not represent the real conditions in the field. Instead, it is an analysis based on the assumption that all data presented here is considered accurate and reliable.

Unemployment in Indonesia is quite concerning. Based on the visualizations created, it is evident that around 5 to 6 percent of the population is unemployed. While this percentage may seem small at first glance, the actual numbers paint a far more alarming picture.

For a basic and simple example, in 2018, the visualization shows that 5.20 percent of the recorded population (presumably those ready to work) were unemployed. If we cross-reference this with external data from BPS, in 2018, there were around 7 million people classified as openly unemployed in Indonesia.

7 million is not a small number, and the individuals included in this data are not children or newborns. These 7 million people are individuals who have entered working age but remain completely unemployed.

That’s just the data from 2018. Considering that Indonesia's population keeps increasing, the actual number of unemployed people will also rise.

If the percentage remains roughly the same at 5 to 6 percent, wouldn’t that mean the actual number of unemployed individuals will grow even larger as the base population increases?

That’s just the overall unemployment figure. While it does represent a major part of this data analysis, the real concern goes beyond just knowing that millions of people are unemployed in Indonesia.

If we dig deeper, there are even more alarming issues than just the fact that millions are jobless.

Let’s start with unemployment based on residential areas.

According to the data, open unemployment rates are higher in urban areas, whereas underemployment is more dominant in rural areas.

What does this mean?

Urban areas, which are supposed to be hubs for job opportunities, are instead producing more unemployed people.
Rural areas, on the other hand, have higher rates of underemployment than full unemployment.

While underemployment is not ideal, isn’t it still better than being completely unemployed?

As seen in the first visualization, the bar representing job seekers is always FAR HIGHER than the bar for job vacancies, not to mention job placements, which clearly fail to fully absorb the workforce.

The negative aspects already highlighted are further reinforced by the fact that education in Indonesia is highly irrelevant in determining whether someone will be unemployed or not.

If higher education truly guaranteed employment, then in the full unemployment visualization, the lowest position should always be occupied by graduates and degree holders.

However, the reality is they are positioned in the middle instead.

What's even more concerning is that, upon closer inspection, the top position in underemployment is occupied by those with the lowest education levels.

The underemployment sector is dominated by individuals with low education,
Meanwhile, degree holders are stuck in the middle tier of full unemployment.

What’s going on here?

Based on my personal observations as someone who has lived in Indonesia since birth, this situation likely occurs because connections (networking) hold more value than higher education.

If my assumption is correct, this will have severe consequences for Indonesia's future. Placing individuals in important positions based on connections rather than competence hinders an optimal system and ultimately cripples progress.

The bad news in this data analysis keeps pouring in like an endless storm, growing even more intense. From the age perspective, full unemployment is once again dominated by the productive-age population. If a group of people who should be at their peak productivity is instead left unemployed, isn't that a serious problem?

And the final piece of bad news—if we take a closer look, the TOP 5 provinces with the highest full unemployment rates are actually some of the most densely populated regions.

Banten, West Java, Jakarta, Riau—these areas, especially West Java and its surroundings, are widely known as prime destinations for "Merantau".

What is "Merantau"?

"Merantau" is a cultural practice where people leave their hometowns to seek better job opportunities elsewhere.

Yet, if a place famous for being a job-seeking destination is actually harboring alarmingly high unemployment rates, isn’t that a bitter reality?

There is a saying that everything has its good and bad sides.

Unfortunately, that doesn’t apply here.

High unemployment rates
Education failing to guarantee jobs
And all the other harsh realities I’ve outlined

Sadly, it really is that bad.

Sometimes, reality is bitter, and we seek distractions just to make our suffering feel a little less unbearable.

We cling to lies that we’ve convinced ourselves are truths, yet in the end, the "sweet truths" we believed in slowly betray us—one by one.

Thank you for reading my simple analysis.

Get in Touch

We would love to hear from you! Please fill out the form below to contact us for any inquiries or assistance.

Connect

123-456-7890

Support

contact@luxurystyle.com