Sunday, March 26, 2017

Women in PL_IT report

Hey!
I thought that today I will cover a topic not directly related to coding, but resolved around women in technology. I decided to do that after finding a report, that analyzes women's situation in Polish IT market in 2016.
I decided to share general insights coming from this document and also to compare it with a prediction made by Deloitte for the year 2016.

Although Deloitte's document is claimed to be a prediction, the only prediction we get is that by the end of 2016, fewer than 25% IT jobs in developed countries will be held by women. Apart from that, Deloitte's report is a collection of valuable insights made in particular studies. The report has a rather pessimistic outcome - it states that the number of women attending educational institutions in the field related to IT decreased over the years, that women tend to earn less than men (79 cents for US based female web developer compared to the 1 dollar for a man in the same position) and that women are a lot less likely to be hired for a senior position, starting from the targeting of ads to the actual hiring process.
These are sad statistics that should be considered while designing steps to actually change this situation. But how do women perceive their situation? I believe that this is what the Polish report puts the emphasis on.
From this report, it results that almost 75% women recommend working in IT, so maybe it is not that bad for them after all?
Deloitte claims that one of the main reasons for a current situation of women in technology is the problem of education. It seems, however, that they do not take into account a possibility to work in IT without having an academic degree in this field. Responses in Polish report indicate that most of the women working in IT do not have a degree in IT related program. Women also believe that their interest in IT results from their own determination. This is an interesting insight, but the fact that there is an increasing number of people working in IT without a related degree may be a more general trend and it does not need to be restricted to women only.
The Polish report also confirms some of the remarks made by Deloitte's report, namely:

  • women think that they earn less than men 
  • it is more difficult for women to make a career in IT 
  • 56% of women who responded to the survey confirmed that they experienced career difficulties resulting from their gender

So far, it seems that even if the women's personal thoughts about working in IT are less striking than claims in the Deloitte's report, they still resemble the general trend.
I've got one more conclusion that, surprisingly, is positive. Looking at the opinions made by experts in Polish IT business, it seems that women's influence in IT section is rising and that some of the statistics may seem depressing because it has not been long since women started becoming more attracted to programming and IT careers. Hopefully, it is the case. Unfortunately, we need to wait a couple of years to verify this thesis, but there is still a lot that can be done in the meantime. 

As a person, who used to think that computing and programming are some kinds of a sorcery, I am very keen on encouraging others to give it a try and see how it goes. I think that there is an unnecessary mystery created around programming and the best way to overcome one's fear is just to try. This post may seem superficial, but I believe that this topic is really complex. I hope that even just reporting the current situation and some minor thoughts about it can encourage other people to think about this issue.

Let me know what you think! 
And here is another report related to the subject, in case you're interested :).



Saturday, March 18, 2017

K Means clustering

Let's start coding!
Today I would like to befriend you with a very basic algorithm widely used in data mining and machine learning. It's K-Means, of course!
Think, that there is not too much of advanced math in computing it either! We basically just need to know what is Euclidean distance and how to properly filter the data.
You may be wondering now if I have seen the equation of K-Means and that it looks so difficult that you perhaps need some superpowers to gain an understanding of it. That's what I thought too, but apparently, it's much easier when you get rid of the formal description. I will not be trying to explain complicated maths and theories behind this algorithm (should I?). It will be more like introduction with some hands-on practice.

First of all, what is K-Means?
It is an algorithm that is often used in clustering. Clustering happens when we want to group observations in our dataset in distinctive groups to make the analysis easier. Let's use a simple example. Imagine that you have a set of jobs that you are considering to apply for. Each job that you found has certain descriptive features, as salary, amount of free days, how far it is from your apartment etc. Now, how do you pick the best offer? This is when K-means clustering comes in handy! Thanks to this algorithm, you can go over your feature space, compare feature vectors [salary, free days, km from home] between each other and group them according to these three labels. Although K-means clustering is considered to be a case of unsupervised learning, once you split your dataset to k groups, you will be able to give them particular labels according to their place in your n-dimensional space.

Okay, but how do we start?
Of course, you can skip the hard part for now and go and play with some examples provided by Python library called sklearn, but we'll start from scratch!

We are going to group Iris Species into three groups! Te dataset was taken from Kaggle, which is a great place for everyone interested in data science.
The dataset looks like this: it consists of 150 samples, 50 samples per each species. Each sample has the following features: sepal length in cm, sepal width in cm, petal length in cm, petal width in cm and a label (the real species). It means that each observation has 4 features (saved in a vector) and a label that right now will not be taken into account.
Before we start any computations, we need to get rid of the unnecessary information, such as columns descriptions and IDs.
The description tells us that we have got three different species, so if we want to group the samples it would make sense to group them in three clusters.

Let's implement our K-means clustering!
My algorithm is slightly simplified. In general, the initial k cluster centers should be generated randomly. In my code, I first randomly shuffle the dataset and then assign initial k cluster centers to the first k observations.
Below, there are steps to be followed while creating K-means clustering algorithm.
  1. Choose initial cluster centers. 
  2. For each cluster center (which is a n-dimensional vector) calculate a Euclidean distance to all other points in the space. This means that we should calculate Euclidean distance between our cluster centers and all the flowers. 
  3. In result, for each observation we obtain three results (one for each cluster center).We assign each observation to the cluster center that resulted in the lowest distance.
  4. Recompute cluster centers! Recomputing is done by getting the mean of points assigned to each cluster. 
  5. Check whether recomputed cluster centers match the initial ones. If not, repeat steps 2,3 and 4 until they do match. 
Here are the very same steps, but in the actual implementation!


Below, you will find two visualizations of the data (available thanks to PCA). As you can see, some of the datapoints were not classified correctly, although the general pattern of assignment remains very similar.





As an algorithm, K means is really simple. What may be difficult is the theory and things that need to be considered while implementing the algorithm. Machine learning is not only about data, it is also about the knowledge and experience that is used to verify certain ideas of data analysis. I will try to cover some general thoughts about K means in the next entry. In the meanwhile, feel free to check out my code, experiment and post comments and thoughts. I will be happy to learn from you!




Sunday, March 12, 2017

Intro

Hello everyone!

I would like to introduce myself and the topic of this blog. 
My name is Agata, I currently live in Copenhagen, where I study IT&Cognition as my Master's degree. English is not my native language, but I hope that it is not going to be an obstacle for those of you who are interested in what I have to say.  
The first thing you should know about me is that I am a baby scientist, which means that currently I am doing anything I can to enter the world of science for real. Amazingly, my fields of interests include technology and cognition, which happens to be exactly the thing that I study! 
In this blog, I am planning to share information about cognitive science and technology and how these two things can be combined. I was inspired to create it by a Polish contest (www.dajsiepoznac.pl).
Of course, I am not a great blog writer, so far I mostly wrote scientific articles, but I hope it is going to change! I would like to tell you something about what I learn in a very simple language. I will not lie and say that I understand everything immediately and I just wait until I apply this fresh knowledge in some advanced machine learning project. A typical situation looks more like me spending hours on trying out simple pieces of code and reading about the theory in half of the Internet. Only then, my solution has a chance to work. The good thing, however, is that once I understand it, I can find simple examples and toy-problems to explain the idea to other people! And this is exactly what I am planning to do!

I hope that thanks to my posts, you will be able to understand how modern artificial intelligence works and how it is related to the actual cognitive science. We will start from the basics, because that's where I am at right now, but I'm hoping to reach some more advanced stuff anytime soon!

Thanks for reading!
Also, if you are interested in more artsy stuff coming from me, you can check out my Polish tumblr: http://herztier.tumblr.com. There is not so much going over there, but maybe something will change.