Pekerja VS Pengangguran

DI INDONESIA

This project contains data analysis from several datasets on unemployment and workers in Indonesia. Some of the data used in this analysis may have biases related to technical issues and the possibility of political interference behind it. Therefore, please understand that the results of this analysis are likely to be inaccurate, as I am aware that assumptions built upon other assumptions will lead to conclusions that deviate further from actual reality.

However, if you are willing to read this project further, let us continue. I sincerely thank you for taking the time to read this simple project.

The data to be used in this project contains many biases for several reasons:

  1. Some of the data provided by BPS are backcast results. Backcasting itself is a method that only produces estimated values rather than actual data.

  2. Even if the data is not a result of backcasting, it may still be inaccurate due to several technical issues. As is widely known and considered an open secret in Indonesia, survey respondents sometimes provide inaccurate information to influence the government into granting them aid.

    One example of manipulation by some individuals is reporting that a family member is unemployed and still young. This is often done to qualify for financial assistance known as the "Prakerja" program.

Now that you are aware of these biases and limitations. With these considerations in mind, let's move on to the project and examine the data in more detail.

The purpose of this data analysis is to understand the movement of unemployment and employment rates.

So the main question is:
Is the number of workers in Indonesia increasing (which would be a good sign), or is it decreasing (which would be very bad)?

data -

After exploring the BPS (Badan Pusat Statistik) website, I obtained 12 types of data to support our analysis this time:

  1. Employment and unemployment data.

  2. Open unemployment data - by region.

  3. Open unemployment data - by gender.

  4. Open unemployment data - by education level.

  5. Open unemployment data - by province.

  6. Open unemployment data - by age group.

  7. Underemployment data - by region.

  8. Underemployment data - by gender.

  9. Underemployment data - by education level.

  10. Underemployment data - by province.

  11. Underemployment data - by age group.

  12. Core data - job vacancy fulfillment.

Employment and unemployment data -

This data contain data about number of employment and unemployment in Indonesia.

  • Categorize / status : employment, unemployment.

  • Year.

  • Time / month : February, Agust, Yearly.

  • Percentage.

  • Number (thousand).

Open unemployment data - by region

This data contain data about number of people that are not working at all and looking for a job - filtered by region they live in.

  • Categorize / status : city, village / town.

  • Year.

  • Percentage.

Open unemployment data - by gender

This data contain data about number of people that are not working at all and looking for a job - filtered by gender.

  • Categorize / status : male, female.

  • Year.

  • Percentage.

Open unemployment data - by education level

This data contain data about number of people that are not working at all and looking for a job - education level.

  • Categorize / status :
    - Not attending school / not yet graduated, not graduated and graduated from elementary school.
    - Junior high school.
    - General high school.
    - Vocational high school.
    - Diploma 1/2/3.
    - University.

  • Year.

  • Percentage.

Open unemployment data - by province

This data contain data about number of people that are not working at all and looking for a job - filtered by province.

DATA CLEANING -

In this process, what is usually called data cleaning is more accurately referred to as data reorganization. This is because the data obtained from BPS is already clean. However, its format or structure may not be optimal for analysis, so some adjustments are necessary.
I will not explain each file one by one or detail every step of the reorganization process. Instead, I will outline a series of steps I take when encountering specific conditions in the data.

THE KEY STEPS ARE :

Remove data that is irrelevant to the information provided by the dataset

Select the data range, import it into Power Query, and apply transformations based on the predefined considerations as follows:

>>

If the data is in the form of a pivot table, I will unpivot it.

rOTATE to unpivot

If unpivoting the data results in empty steps, I will fill them downward.

<<

If any headers become empty after restructuring, I will add appropriate labels.

Data modeling -

The data that i use for this analysis is comes from multiple seperate file. As an example : Employment and unemployment data is comes in 6 files. Each file contain about Employment and unemployment number thats wriiten in certain year. There is 6 file, means comes from 6 year. From 2018, 2019, 2020, 2021, 2022, and 2023.

Before proceeding to the data visualization step, it is best to first combine data from multiple files into their respective categories based on their relationships.

If a data category is better suited for the append rows method, then all files should be combined using this approach. Conversely, if a data category is better suited for the merge method, then merging should be applied accordingly.

Open unemployment data - by age

This data contain data about number of people that are not working at all and looking for a job - filtered by age.

  • Categorize / status : age scale.

  • Year.

Underemployment data - by region

This data contain data about number of people that are not working at full time or using their full skill - filtered by region.

UnderEmployment data - by gender

This data contain data about number of people that are not working at full time or using their full skill - filtered by gender.

  • Categorize / status : male, female.

  • Year.

  • Percentage.

UNDERemployment data - by education level

This data contain data about number of people that are not working at full time or using their full skill - filtered by education level.

  • Categorize / status :
    - Not attending school / not yet graduated, not graduated and graduated from elementary school.
    - Junior high school.
    - General high school.
    - Vocational high school.
    - Diploma 1/2/3.
    - University.

  • Year.

  • Percentage.

Underemployment data - by province

This data contain data about number of people that are not working at full time or using their full skill - filtered by province.

  • Categorize / status : province.

  • February

  • Agust

  • Categorize / status : province.

  • February

  • Agust

Underemployment data - by age

This data contain data about number of people that are not working at full time or using their full skill - filtered by age.

  • Categorize / status : age scale.

  • Year.

  • Categorize / status : city, village / town.

  • Year.

  • Percentage.

Regardless of the type or category of data being combined, the key point is that the data used in this analysis is time series data—whether it is monthly or yearly. The main focus of this analysis is to observe changes in a variable's values over a specific time interval.
Since the data to be analyzed consists of changes in a variable’s value over a specific time interval, the data merging method should also depend on how the data is presented over time. For example:

And just as I did in the data cleaning process, I will not explain each step in detail for every file. I believe the explanation above is sufficient to describe the steps I will take based on the situations mentioned and the considerations outlined. However, I will still present the final results.

*I will use Power Query for all dataset merging processes. I will also remove all annual rows from any dataset because their content is "-", and I am unsure how to fill them appropriately. If I were to fill them with the average of data from February and August, I believe that would be unfair, considering that the average would be based on only two samples, while the total population in this context represents 12 months. Therefore, I have decided to remove the annual rows entirely

In the "by region" dataset, the time intervals are arranged across multiple columns (horizontally), such as Column 2018, Column 2019, and so on. Therefore, the most suitable merging method is merge, using the matching rows of the region column as the key.

In the second example, the "Employment and unemployment" dataset is structured vertically, with a column that records the year for each row. This means the time intervals are arranged vertically. Therefore, the recommended merging method is Append, combining rows from other files accordingly.

However, in the main dataset, the case is slightly different. This dataset does not contain columns with time interval data arranged in rows, nor does it have a column explicitly indicating the time interval for each row. Therefore, an additional preprocessing step is required.

First, add a Year column (since each file represents a specific year). For example, in the 2018 file, fill the Year column with 2018 for all rows, and do the same for other files accordingly.

Once this step is completed, the data can then be merged using the Append method.

ASKING -

In general data analysis, or more precisely, typically, the process of asking the right questions is placed at the beginning of the data analysis workflow. However, in our project this time, the questioning phase comes after the data has been cleaned. Why? Because the data source in this project is relatively incomplete. Therefore, I have determined that the process used in this data analysis will be slightly different.

In this case, the process of "asking questions" in data analysis is better approached by posing the core questions first, followed by technical and detailed questions later, after all the data has been cleaned and gathered.

With this workflow, I believe the core questions are answered first, while technical and detailed questions can follow later. I think this approach is reasonable and well-accepted.

The core question is: "Has the trend of unemployment rates in Indonesia over time been getting better (meaning the unemployment rate is decreasing) or getting worse (meaning it is increasing)?"

Considering the main question, which aims to identify data trends, and the data obtained from BSI, the following questions are worth considering:

  • Is job fulfillment in Indonesia effective?

  • The development of unemployment rates vs. workers over time.

  • Over time, which gender has more unemployed individuals, male or female?

  • Over time, is unemployment rising more in rural areas or increasing more in urban areas?

  • Is the development of unemployment data always relevant to a person's level of education?

  • Do young workers contribute more to the unemployment rate, or is it the older ones?

  • Are the top provinces with the highest unemployment rates always occupied by the same or similar provinces, or do they vary over time?

Answering with data visualization -

Is job fulfillment in Indonesia effective?

Tableau will be chosen for the data visualization part of this project.

Is the fulfillment of labor for available job opportunities effective in Indonesia? To answer this question, the main dataset will be used, as it contains relevant information to provide an answer. Columns such as job seekers, job vacancies, and labor fulfillment are clearly present in this dataset.

There are three main categories in this dataset: job seekers, job vacancies, and labor fulfillment. I did not receive accurate information from the data author regarding the exact meaning of labor fulfillment in this dataset. However, if it is referred to as job placement/labor fulfillment, it usually pertains to job vacancies or positions that have been successfully filled. Therefore, this will serve as the primary benchmark in the analysis of this data.

*Important Note: It is crucial to exclude the "Indonesia" row from the visualization since this data visualization will focus on provincial blocks rather than Indonesia as a whole. For now, the visualization will be more concentrated on each individual province.

The main dataset consists of 2 pathways and 3 classes. What I mean by "2 pathways" are the routes that will be used in the data analysis for this dataset.

For this main dataset, analysis can be conducted if the data is examined through one or both of the available pathways. The first is through the recorded region (in this dataset, the province), and the second is through the time sequence.

Meanwhile, the 3 classes refer to male, female, and total. The total class itself contains the sum of the male and female data. The total class is necessary to assess how much the difference between male and female numbers influences the overall result.

Since there are two available pathways, I want to start with the recorded region pathway (province). After that, I will proceed with the time series pathway.

Based on the data presented in the visualization above, it is clear that something concerning is happening. Regardless of the class, the number of job seekers is significantly higher than the number of available job openings. This is followed by a slightly lower number of job placements, which, while closer to the number of available job openings, still does not reach the required level.

From this visualization, we can conclude that the number of job seekers in Indonesia is extremely high. However, this high number is not matched by an adequate number of job opportunities. Even when job openings are available, not all of them fully absorb the labor force.

If this situation continues, the unemployment rate in Indonesia will continue to rise.

After analyzing the data on total, female, and male categories in terms of job openings, job seekers, and job placements by province, let’s continue with the analysis of job openings, job seekers, and job placements for males, females, and the total, also based on province. This will help identify which category plays a dominant role in shaping job openings, job seekers, and job placements data.

Based on the visualization above, it can be inferred that whether in job openings, job placements, or job seekers data, the majority (in almost all provinces) is consistently dominated by males, while females follow behind.

Based on the visualization above, it can indeed be concluded that the majority of the data is dominated by males. However, these figures only provide an overview without detailed insights. Therefore, in the next visualization, a deeper analysis will be conducted to determine the exact percentage of male dominance and female subordination in shaping the data.

Additionally, the next visualization will incorporate a time series parameter to observe the development of gender dominance over the years. This will help determine whether males consistently dominate the data significantly or if, in certain years, their dominance weakens.

In the job seeker category, males slightly dominated the data in 2018. However, in 2019, females overtook the dominance, though only by a small margin. In contrast, in 2020 and 2023, males dominated with a noticeably wider gap, unlike the close competition between males and females in 2018 and 2019.

Job Openings Data – From 2018 to 2019, females briefly dominated with a noticeable gap compared to males. However, in the 2020 and 2023 job openings data, the visualization shows a significant shift, with males taking the lead by a substantial margin. In those two years, male dominance even exceeded 60 percent. Personally, I consider that a strong enough figure to indicate a drastic difference.

Job Placement Data – Its pattern is not much different from the job openings data. From 2018 to 2019, the data was dominated by females, while from 2020 to 2023, males took the lead. The only noticeable difference is that in 2018 and 2019, female dominance in job placements was slightly higher than in job openings during the same years.

In job openings data for 2018 and 2019, female dominance ranged between 56% and 53%. Meanwhile, in the same years, job placement data showed an even higher female dominance, ranging between 54% and 58%.

Is job fulfillment in Indonesia effective? The answer is clearly NO. Even from the first visualization, it is evident that the number of job seekers is FAR HIGHER than the available job openings. On top of that, the available job positions are not even fully occupied.

Knowing that males dominate the data is just a bonus insight because, at most, the gap reaches only around 60% higher than females. This means there is no sampling bias where one side is disproportionately represented in the data.

But let’s return to the main question: Is job fulfillment in Indonesia effective? Considering that the number of job seekers is overwhelmingly high compared to the available job openings, and that not all job positions can absorb the workforce to their full capacity, my final answer is a definite NO!

The development of unemployment rates vs. workers over time.

Actually, it would be inaccurate to consider this a question because the phrase "The development of unemployment rates vs. workers over time." seems more appropriate as a task description. Rather than a question, it is better described as an analytical assignment that involves examining the data.
However, regardless of what it is classified as, one thing is clear—the data from the Employment and unemployment dataset. is essential for carrying out this task.

Based on the pie chart generated from the data, it is evident that the ratio of employed individuals to unemployed individuals has remained relatively consistent from year to year. Even when differences do exist, the gap is only around 1%. This indicates that there has been no significant change in this disparity over the years.

Over time, which gender has more unemployed individuals, male or female?

Over time, which gender has more unemployed individuals, male or female? - For this question and future questions, two datasets will be used: one representing full unemployment and the other representing underemployment. This approach is necessary because the collected data comes from these two different categories.

However, before proceeding with the visualization, it is advisable to transform any time-sequenced data that is structured horizontally like this.

Into a vertically structured time-series format with some column name adjustments to ensure consistency, like this.

This transformation is necessary to simplify the process of comparing the development of multiple values over a specific time series. The type of visualization used for comparison will be created using the dual-axis method.

Two visualizations represent the two datasets: full unemployment and underemployment.

In the visualization of underemployment trends over the years, the values remained relatively stable from 2018 to 2019, with only a slight decline in 2019. However, in 2020, there was a significant spike. After that, the values gradually decreased until 2022, followed by a slight rebound in 2023.

The full unemployment visualization follows a similar pattern to the underemployment visualization. However, the key difference lies in the 2022 to 2023 values.

In the underemployment visualization, there was a slight rebound in 2023. Meanwhile, in the full unemployment visualization, the decline continued smoothly without any reversal.

Based on the visualization, the unemployment trends from 2018 to 2023 are quite similar between full unemployment and underemployment. However, two key points stand out: the years 2020 and 2023.

Both categories—regardless of how you define them—experienced a significant spike in 2020. This could possibly be influenced by the population census conducted by the government that year, considering that data from other years may have been modeled based on previous census results. But who knows? That’s just a guess, as government-related technicalities are best understood by the authorities themselves. Not to mention how difficult it can be to find structured data in Indonesia due to its somewhat unorganized data collection system.

Regardless of the cause, for now, let’s assume that this dataset is reliable enough for the purpose of this analysis.

Returning to 2023, the underemployment data shows a slight reversal in the downward trend observed from 2020 to 2022. Meanwhile, the full unemployment data continues to decline gradually through 2023.

At first glance, this might seem like a positive development—fewer people are fully unemployed, while underemployment is slightly rising.

"That’s a good thing, right? Isn’t being underemployed better than being completely unemployed?"

WRONG.

It all depends on what kind of underemployment we’re talking about.

If underemployment refers to people working part-time, then yes, it’s better than being completely unemployed.

However, if underemployment means individuals working in jobs that do not utilize their skills, abilities, or potential, then it becomes a completely different issue—one that could signal a deeper problem in the job market.

Nevertheless, regardless of how much the numbers increase or decrease at certain points, men mostly consistent remain at the top, while women follow below. However, the gap between them is not too significant.

Over time, is unemployment rising more in rural areas or increasing more in urban areas?

From the visualization, it appears that the development values of each region have a 180-degree difference in the categories of full unemployment and partial unemployment.

Regarding the rise and fall of the lines, the lines from urban areas, rural areas, or a combination of both have more or less the same movement direction and curvature. When the city's position rises, other areas also rise, both in the full unemployment and partial unemployment categories.

However, the main difference lies in which line is at the top in the visualization for each category.

In the full unemployment category, the top line is occupied by urban areas, followed by urban + rural areas, with rural areas at the lowest position. This is the complete opposite in the partial unemployment category. In the partial unemployment category, the top position is actually occupied by rural areas, followed by rural + urban areas, while urban areas are at the lowest position.

This shows that the situation in urban areas is indeed more chaotic compared to rural areas. It is clearly evident that full unemployment is highest in urban areas, while partial unemployment is actually highest in rural areas. Regardless of the form, being fully unemployed is far worse than being partially unemployed.

Is the development of unemployment data always relevant to a person's level of education?

Is the level of education relevant to the unemployment rate?

— Before presenting the visualization, I need to clarify the color gradient used in it. I have chosen a gradient from very dark blue (deep) to very light blue (bright). The darker the color, the higher the level of education. Conversely, the lighter the color, the lower the level of education.

For example, the lightest blue represents the line for the lowest education level, such as those who did not complete elementary school, completed elementary school, never attended school, or have never been to school. Meanwhile, the darkest blue represents the line for the highest education level, which is university education.

I'm quite concerned after seeing that visualization. In the unemployment visualization, vocational high schools (SMK) consistently rank at the top above all others.

The primary goal of vocational high schools (SMK) is to produce graduates who are ready to enter the workforce immediately after graduation. However, the visualization clearly shows that they consistently top the chart among other lines. This clearly proves that the mission of SMKs to create job-ready workers has largely failed, considering the high rate of full unemployment is predominantly represented by this education level.

What's even more concerning about this full unemployment category is who occupies the lowest position in the visualization. Somehow, the lowest position is consistently held by those with the least education. If this trend continues, the effects could be extremely dangerous.

What I fear the most is that people, especially students, will lose faith in the value of higher education. If those with no schooling, only elementary school diplomas, or even those who didn't complete elementary school have the lowest unemployment rates, it could lead to public distrust in educational institutions and the fairness of job selection in the workforce.

Why do I believe job selection is unfair after seeing this visualization? The reasoning is simple—if we think logically, having a higher education should provide more opportunities and choices when it comes to securing a job.

Underemployment – a phrase that can be defined in multiple ways. Depending on which definition is adopted, the interpretation of the visualization category of underemployment will also vary accordingly.

If underemployment is defined as those working in fields that do not align with their skills and expertise, then the visualization of this category of underemployment can be interpreted as sending a rather positive signal. This is because those occupying the lowest position, even lower than other lines, are individuals with the highest levels of education—diploma and university graduates.

Diploma or university graduates receive higher vocational education. Those who graduate from these institutions have their skills honed in their respective fields much more deeply than those from vocational high schools. Therefore, if individuals who have studied at higher vocational institutions have a very low underemployment rate (meaning they work in jobs that do not match their skills), it can be interpreted that many graduates at this level of education are working in fields that align with the expertise they acquired during their studies.

Let’s just hope that underemployment in this data truly means that. Considering that the data does not provide any clarification on what kind of underemployment is being referred to.

Do young workers contribute more to the unemployment rate, or is it the older ones?

That's quite bad. The top position is always occupied by those in their most productive ages. What does it mean that the productive age group is actually the top scorer in unemployment? That’s clearly a nightmare. Younger individuals are highly valuable in the economy because they still have a lot of energy compared to older workers. Even though they may lack experience, having more young people employed would, in turn, increase the overall experience of the youth.

However, this visualization instead shows that the ones leading the unemployment rate—both in full unemployment and underemployment—are young people. This data visualization proves that the job market in Indonesia has failed to utilize a workforce that is actually abundantly available. That’s very concerning.

white flower

Get in Touch

We would love to hear from you! Please fill out the form below to contact us for any inquiries or assistance.

Connect

123-456-7890

Support

contact@luxurystyle.com