Social Network Analysis
Articles Blog

Social Network Analysis


So hello, everyone. So I’m Saptarshi, in case
anyone doesn’t know me yet. So the topic today is
Social Network Analysis. So I start with a few
customary things. First, thanks to
Microsoft Research for bringing me here and giving me
the opportunity to present this. Also, like most of the talks
have started with questions like how many are linguists? How many are this, and that? How many are computer
science people out here? Okay, so that’s a bit consoling
because my talk would be a bit biased towards that. However, I have tried not to
go into any technical details as much. So they should be
accessible to others. And as always, please feel free to ask any
questions at any point of time. So I’ll talk about
social networks. So let me start with
a bit of history. So, social networks in the
offline world have been studied for last several decades. I mean there’s much back to
study on online social networks which were not even
there before say 2000. So the social networks in the
offline world which were studied are mainly,
you can see some examples. For example, friendship networks
among students of a school, then members of a club then
collaboration networks among scientists and movie actors. So say several scientists have
collaborated with each other to write a paper, then say they’re connected to
each other in a social network. Or if multiple actors have
acted in the same movie, then they’re connected
to each other. So these kind of networks used
to be studied in the offline world for a long period of time. And the social network theory
is quite well developed. So there are several
sociological theories, some examples of which
you are seeing here. And these sociological theories
have often resulted in these quotations or
sayings as we know. For example, birds of
a feather flock together. This is typically the result of
different sociological theories, that people who have
similar tastes, or who are similar in some way,
tend to form groups or tend to be connected to each other
over a social network, right? I mean, technically this
characteristic is called homophily but in common terms, it’s like the birds of
a feather phenomena. And in fact, as you can see for
these two theories I have put a date because I know of a paper
which first proposed that. But for homophily, I don’t even
know which study was the first which kind of officially
proposed that theory. So it dates quite back to even
maybe early in the 20th century or something. So in 1961, possibly one of
the most famous experiments on sociology was done by
a person called Milgram. So what he did is, I mean, it’s difficult to imagine
that he did this experiment. He actually handed out letters, letters to different
people in the US. And so these letters were addressed
to people in other cities. So maybe he was doing this
experiment in say, Chicago. And the letters were addressed
to someone say, in New York. And this persons to whom the
letters were given did not know the person to whom
the letter was addressed. So what he said was, hand over
this letter to someone whom you know by first-name basis or whom
you address on first-name basis, such that this letter has
a higher probability of going to that person to whom
it is addressed. And ask that person
to do the same. So I give you a letter and tell you to give it to someone
of your close friends, such that maybe someone in,
say Delhi gets the letter. So this continued,
this experiment continued and many of the letters
did not reach. So most of the letters that
you said did not reach the destination, because it’s
a very difficult thing to do. You are being given a letter
addressed to someone whom you don’t know,
who lives in another city. And you can only hand it over
to one of your close friends, someone who you know
on a personal basis. So usually you will think of,
okay, maybe this person has a higher
probability of knowing someone in Delhi than I. So you will give
the letter to him or her. So anyway, this was
a difficult experiment and most of the letters
did not reach. However, about 25 to 30%
of the letters did reach. And for
the letters which reached, it was seen that the number
of persons through which this letter had to go was close
to five or six, not more. So that led to this theory
called six degrees of separation, which said
that anyone in the US is connected by a hop of
at most distance, six. Great, so yeah. That was one theory of six
degrees of separation. In more technical terms, it means that a social
network can be quite large. However, if you take two
people in the social network, usually that is a short part. And by the way, most of these
theories have been validated on the online social networks, which I’ll come to later and
these hold. So for example,
just to put this in context. This study has been
tried up on Facebook. This study was done
by Facebook itself. Therefore, they had
the access to the whole data. So Facebook network is
a network consisting of, maybe, a billion users. And at that point of time, there
was about 800 million users. Even then, they found that
the degree of separation, that means, you take two nodes
or two actors in a network and measure the shortest distance
among them on average. It comes out to less than 6. It was between 4 and 5. And so even for a network of
that size, of billion nodes, this holds. Another famous theory is called
the strength of weak ties. So usually, our ties,
the social ties we have, the social links,
they’re not all equal. Some are strong, some are weak. How do you define strong and
weak? There is no fixed
consensus about it. But in intuitive ideas, people
with whom we interact more, we have strong ties with them. Maybe we meet them everyday. So whereas there are some people
whom maybe write an email once a week or even less,
those are our weak ties. So everyone has some strong ties
and possibly many weak ties. However, the weak ties are very
strong in one other aspect. That is, well, let me ask you. So it’s a weak tie, you do not
interact with this person often. What do you think can be
the strength of such a weak tie?>>[INAUDIBLE]
>>Yes, it connects to very joint,
maybe communities, so?>>[INAUDIBLE]>>Yes, but maybe a practical
utility of a weak tie. So the idea is this that people
with whom you have strong ties, maybe they are in your family or
in your job, in your workplace. They won’t be able to tell
you something very new. Possibly, they know
almost what you know. Whereas if you have weak ties, maybe someone in another
city alone, he or she may actually give you
some very new information which you do not know. It has actually been seen that
weak ties are quite useful during job search. So maybe people in your company
or telling you of a new job. But others who may be your
school friend, with whom you have just a weak tie now, he’ll
tell you of a new job maybe. So those are some
cases where weak ties, the name is still weak ties,
but they are strong. And apart from that there
have been other slightly more complexities like how epidemic
spread in our social network. So any disease or any convention which maybe
spread through contact. Which might be physical contact,
as well as maybe just seeing someone do something,
you also repeat it. So how they spread in
the social network, so these theories are there. Yeah, so now, as I said, these
studies were all going on in an offline world, in social
networks in the offline world. So how were these studies done? Actually, so for this study,
this is a very famous study, researchers went a US school and
asked the students, who are your friends,
how close are they? Who are your strongest friends
and questions like this. And they did this study, which was very famous
at that point of time. But then, what you can imagine,
how many people can you do a survey with,
in an offline world? Maybe 200, maybe 500, a 1,000, and that was the scale
of these studies. So as you see If you,
I mean these kind of studies, they were done with maybe 400
students in the school, right? So I’ll come back to
this picture later, to explain what this is,
but the scale is this much with few hundreds at most
of it thousands and so on. Okay, then came online
social networks, right, this is
a standard picture. I don’t even know some of
the networks out there. I mean,
there are too many nowadays. So then came online
social networks and they had a huge change. So they quickly became very
popular around the web there are billions of users and there’s a huge diversity
around the users. There are celebrities,
media houses, politicians, common people. Also, there are bad users, so
evil users some might say like spammers, cyber-bullies,
hatemongers and so on. They are good for us. Advertisers wouldn’t be able to do the project if
hatemongers were not there. But yeah,
there are usually evil people. So there are both very popular
well-meaning users who give a lot of valuable information. Also there are these evil users. And also, the social networks
have started having a huge impact on business. A lot of advertisements and all are driven by
the social networks. So, I mean I mean at
one point of time it was maybe the advertisement
on the search engines, now it has become advertising
on the social media. So people pay a fortune to get
things to advertise on Facebook and to target people on
Facebook and whatnot. So this was the popularity
of social networks and also a lot changed for
the researchers. First of all, the huge data
relatively is readily available. You no longer have to go and
do surveys of 500 or 600 people. You can write
a computer program and get data a half of
million people. Also, the data is usually more
reliable because think about it, if I go in the offline world and
ask you tell me who are your friends and
tell me the five best friends. First of all, you might not be
able to remember all of them. Next, you would not be willing to offend the others by telling
who are your very best friends. These kind of things. The human factor is
coming to surveys. Those surveys are definitely
necessary for some other things. But just for collection of data,
it’s easier or more accurate to
automatically get data. So I can immediately figure
out that maybe on Facebook, with whom have you interacted
most in the last month or so? And based on that, I can define
who are your closest friends. And so we have these,
we have seen this earlier, we have these three
V’s of big data. And all of them are present
in social media. It’s like to generate
new content everyday and use variety of the data and
also velocity. When something
important happens, especially like say elections
in the US, and all, we have actually seen thousands
of posts are possibly permanent. So velocity is really,
sometimes, we do not like
someone’s velocity. It’s to that extent. So because of all this, there has been a huge
multi-disciplinary research going on on social networks. These are just some of the
fields which are contributing. Maybe the technology, the social
networks technology is being develop by computer networks and
distributes systems before. Because these systems
are known to be real time, so that what someone post
in a far away land has to come to your timeline in
a few seconds or less than that. So there are huge network or
distributed systems. There we have sociology, social psychology,
I’m not going into details. We have lots of topic here,
network science usually who do theoretical studies on large
graphs, or large networks. That is network science,
or complex network theory. And there are fields
like data mining, machine learning, and all. I have clumped
them all together, because usually these
deal with data. Now of course there are a lot
of overlaps and there are many studies which use both
the network and the data, right. So, but in general these
are some of the fields and still some there
might be others. Okay, so,
in the first part of this talk, I’ll just informally talk about
some of the research issues. And usually also help understand maybe some of the properties
of the social networks. So as I was telling you, there
are many sociological theories which have been developed
on offline networks. They have been validated
on social networks. Online social networks,
I must say. Things like so strength of weak
ties, homophily and so on. So this was an interesting
study as I’ve said. So interesting question can
be how do conventions spread in a social network? Again this is a study which is
traditionally difficult to do in an offline setting. Because often conventions are
kind of like people who do them without first realizing
when they first did it but if you have digital footprint
of say tweets posted and all then it’s easy to figure out
who was the first person who did this then expect to whom and
so on. So this was an interesting study Their study to the convention
of retweeting in Twitter. So they had to log of all
tweets in Twitter during, till a particular point of time,
I think in 2000 yeah. So all tweets from the start of
Twitter to 2011, they had and. So they studied
the convention of retweeting. So retweeting in Twitter, the most popular convention
is to do via reRT. You put an RT in front of
the tweet and you then retweet. But there were other conventions
earlier For example, via. Via was another way
of doing a retweet. So initially when Twitter
started, first few, maybe first couple of years, there were different
conventions of retweeting. Ultimately, this convention
of RT dominated and became almost the de
facto standard. So how did that happen? So how did several convention
spread and ultimately, one of them became
the de facto standard. That was the goal of the study. So I said, different studies
have tried to compare between offline social networks and
online social networks. And of course,
there are many similarities. However, some differences
have been observed, and there because of some reasons. For example, so I think all of you have
heard of a Dunbar number. So Dunbar number is usually
a number around 120, and it says that,
for a human being, it’s difficult to have more than
120 meaningful relationships. Meaningful social links, but that meaningful
is a loaded term. What do I mean by meaningful? So this theory came from
offline social networks. Usually when someone
would not have meaningful relationships with more than 120
people, but think of Twitter, Facebook, and all. People have thousands of links,
and that is typical, because the cost of maintaining a social
link is close to zero, right. That’s on an online setting. Also, geographical distance does
not matter, so you might be someone in India, someone in US,
might be interacting everyday. So let me show you
a nice picture, so this is a locality of
friendships in Facebook. As you see, so as you understand
the brighter portions have more Facebook users, which is
kind of understandable, US, Europe, India also has
quite a large population. But if you see these,
so these kind of arcs, these are showing
intercontinental friendships on Facebook, and
there are quite a lot. Much more than what you would
expect in an offline setting. Right, so there are many people
in Europe and the US and even Europe and parts of Latin America who
are close friends on Facebook. They interact very regularly, this is difficult to achieve
in an offline world. Okay. So, Okay, next I will include a bit of technicality,
but I can keep it very limited. So one thing is to start this
social radical issues and that is typically done on
graph model of social network. So social networks start
typically as a graph on network. When users are typically
loads on what this is and the relationship between them, the social links
the digest all links. And there are different
varieties of networks like undirected, Facebook is
an undirected network. So if you are a friend, so if A is a friend of B,
B is also a friend of A. It’s a mutual friendship, whereas twitter is
an unidirectional network. You can follow, say,
Sachin Tendulkar, but Sachin Tendulkar does not
need to follow you, right. So it’s a new
relational network, and other slightly more complex
models have been tried. Some examples are here, so this
is called a bipartite network, where there are two
sets of nodes. Let’s give an example, say
users watch videos on YouTube. So let’s say these are users and
these are videos, and you put a link
between a user and a video if that user has
watched that video, right. So this kind of bipartite
networks have been studied. That would be useful as
they are practically used. So any idea where
this can be used? Okay, I’ll come back later. Okay, so
these are bipartite networks. If you think, or this is too much complex, let me
just show you another picture. This is a tripartite hypergraph,
okay. So few years back, a kind of
social networks became popular which were known as social
tagging sites or folksonomies. That was the term used. Things like delicious Flickr and
so on. So their users tagged
items with tags or users annotated items with tags. For example,
Flickr was about photos, so there were photos on Flickr and
there were users. Users associated tags
with the photos. Right, so for example, there
is a photo of say Taj Mahal. Users, would say Taj Mahal,
Agra, monument, and so on. So there are now three types
of elements, users, tags, and photos. To broaden such networks,
tripartite hypergraph scale. That’s not going to [INAUDIBLE]. So this is our ascension
of the bipartite network. But here, every edge
connects three vertices. Which is not the norm in
the normal graphs that we study. Anyway, Okay, so now these different network models have
been proposed for networks. And people have defined properties of those
large graphs. So, as I said, there is this
field called network science, a complex network theory which
studies properties of large graphs. By properties I mean
statistical properties. For example,
degree distributions. So degree distributions, again
I’m not going into details. Typically it would mean,
suppose you take all the nodes in the graph and
you measure their degree, and then you draw
a histogram of the degrees. Right, so what fraction of nodes
have degrees at less than ten? What fraction of nodes have
degrees between 10 and 20? What fraction of nodes have
degree between 20 and 30, and so on? Right, so in general, any ideas, how degree distribution of
social network look like? For example, do you expect more nodes
having degree less than ten? Or do you expect more nodes having degrees
say more than 500?>>[INAUDIBLE]
>>Yeah, let’s focus on social
networks here.>>[INAUDIBLE] more
number of people who have friends greater than ten.>>Or at least in the range
of ten [INAUDIBLE].>>Okay, any different opinion? So okay,
when answering this question, maybe we think of our friends,
yeah. I mean, by the way, this is
the common psychology that every friend of mine has
more friends than I have. That’s known to be
a common psychology. Actually, it can’t be,
it’s a paradox. It can’t be. If everyone thinks that all my
friends have more friends that I have, but if that were true,
then there would be a paradox. It’s okay, so when answering
this question usually people go by that mindset, but the truth
is actually very different. There are huge lot of inactive
users on any social network. So the fraction of nodes having
degree say less than 10, even 0 or 1. Is much much higher than fraction of nodes having degree
say more than 100 or something. And it’s not even a linear drop. It’s an exponential
kind of a drop. Okay. So again, I’m not going
into details, but, yeah so yeah, then another
thing is triangles. So okay, I put in here,
presence of numerous triangles. In a social network, usually
there are lots of triangles. Are you surprised, or
is it something to be expected? It’s kind of expected. Because usually, usually, again, statistically speaking,
if you have two friends, they have a high probability
of knowing each other. Again, it’s not true,
anonymous always. Maybe it’s one of your
school friends and one of your friends at would
not know each other possibly. But if you think, you will have
many school friends who will know each other. So there will be lot of
triangles around you, among all your friends there will be
many pairs who know each other. Or who are friends
to each other. So usually social networks
have lot of triangles. And by the way, we need to
do computation on this large networks. If you remember Pretty
was talking about, doing search on large networks. So there is a KDD paper. KDD is the top conference
in data mining. There is a KDD paper on
efficiently counting triangles in a social network. Does this much, but I mean, that’s a really, technologically
that’s a huge challenge upon in a network of the size
of Facebook. Billion nodes, counting triangles is also
a pretty difficult challenge.>>[INAUDIBLE] exactly or
your approximate?>>Approximately, yeah. I don’t think anyone would
try [INAUDIBLE] accurately. Okay, six degrees of separation,
again, another time to use
is a small world so I think we use this as
a part of a common English. So we go to a faraway place and
still meet a friend of ours. We say it’s a small world. So again, there are technical
meanings of what small world is. I’m not going into them, but typically it usually means that
even if a network is very large, there are lots of nodes or
vertices. Even then,
if you pick two nodes, or users, you’ll find this shared
pod between them. Yes?>>[INAUDIBLE]
Yes [CROSSTALK]
right.>>Then the business on average,
is it still?>>So people change the
definition of shortest part of it, and it said, so if there’s
is no part between notes. Take that as
an infinity there for it contributes zero
to the average. Things like these are tried out. But you were right, that, this is assuming that
the graph is connected, yes. And by the way,
let me clarify one thing, I’m not trying to say that
you pick any two nodes in the graph that will have
a distance of less than six, no. Maybe if you pick me and say,
Donald Trump, the distance would be much larger, and I’m-
>>[INAUDIBLE]>>Okay, maybe. Okay, so maybe another
person as infamous as I am, then there will be
a lot of huge distance. There might be a huge distance. But again, we are talking
of statistical terms. Let’s say, I mean, the mathematically correct
statement would be that 99% of all shortest pair distances
would be less than six. And there is this homophily
which I told you about, that birds of a feather
flock together. In technical terms, there
are terms like assortativity and all, which I’m not going into. By the way, again homophily
also depends upon the, when I say similar nodes
are usually connected. It will depend upon what characteristics you
were looking for. For example, these studies have
been done, if you look for. Okay, let’s say in Facebook, the
study which I was telling you, which studied
the Facebook network. They found that Facebook network
is heavily assortative or there is lot of homophily
based on certain properties. Whereas, there’s not so much homophily based on
some other properties. For example, if you think about, let’s say
in an offline world also. If you think about gender, then maybe the network
won’t be that much. If you think about romantic
relationships, and the property that
you study is gender, then the network will not
be autonomal filling. They’ll be no more
filling the network. They’ll be very small
amount of homophily. But if you think about, say age, usually people of the
same age group are more friends. So that was what was
seen in Facebook also. If you consider
age of the users, then the network shows
very high homophily. However if you
consider gender and all then homophily
is not that much. Okay, so these are the different
network properties which people study on networks. Typically starting on
statistical properties of large graphs. Okay, so this picture I have
shown you a little is this friendship network among
US students in US school. So now you see that there
are three types of nodes, white, black, and others. So actually, I mean these
researchers went to these students and asked for their
race, like white, black, and other and whom they
are friends with and all. If you see,
most of the white nodes here, they form connected groups. They are very closely connected
in the social network. Whereas the black star here,
these two. Right, so these, to actually
conceive four densely connected parts in the network, one,
two, three, and four. These are usually
called communities. Okay, so our densely connected
part of the network which is a lot more closely connected
in site that part, than with the others, than
with the rest of the network. That’s usually
called communities. So first of all so this network
could be quite like highly associative art there is large
homophily here according to this, sorry, according to this. And usually there
are these communities. So white students are forming
one of two communities, blacks are forming
others as well. Of course, social networks are
difficult to study because a lot of factors get built. For example, I might ask why
did all the white students not belong to the same community? That might be a question to ask. Maybe these were of different
classes, maybe these were junior school students, maybe these are
high school students and so on. So actually, many factors
come into play when we decide whom to be friends with and
whom to be loosely indirect. And that makes the study of
social network difficult. Okay, we are not going this much
slide so as to start statistical properties seeing for
the social networks. Now, people have tried to
come up with models as to how come this networks
looks like this? See, typically if you think of
people randomly connecting with each other. Then you will not get
a network like this. The network will
look very different. But social networks do not
look like random networks. Rather, they have
specific properties. So they try to model human
behavior as to how come these networks are forming? So I’ll just describe
one briefly. So if you see this thing,
preferential attachment. It’s possibly one of
the most famous models of how social networks develop and
it says something simple. When a new user joins
a social network, he or she is looking for
whom to connect. The probability of this new user
connecting to an already popular user is higher. So the probability with
which this new user connects to another user is
proportional to how popular that existing user is. Right, again,
these are statistical models. Don’t expect that every
user does like this. But typically it is seen that
these kind of models can explain some of the properties
of the networks, though that must be taken
with a pinch of salt. I would just point out
one thing in this light, without going into
technical details. If you see these years,
1999, etc., These models are only considering
the network structure. So they try to explain
human behavior based on graph properties. So I’ll see how popular our
user is, then I’ll connect. But going down, you see
co-evolution of social and content networks, 2012. So these more recent models
start saying things like, not only will we look at the
graph, or who is more popular or not, but
I also look at the content. Does that person post on
a topic I’m interested in? Right, maybe there is
a politician and a movie actor, both are very famous. Even then, if I am interested
more in movies than politics, I am more likely to connect
with the movie actor. So there is a content
network also, apart from the social network,
and they are co-evolving. Right, so these are slightly
more complicated models, but it has been found that
these models explain more properties than maybe only
the network dependent models. Okay, now another thing is,
always comes, that do these properties
remain constant? Over time, I mean. Some properties do,
but many don’t. For example, by the way, I mean these social networks
are highly dynamic. Every day maybe thousands of
people are joining Facebook. Maybe tens of thousands of
new links are being created. Even then,
the Facebook network is so large that these
are all delta changes. It doesn’t usually change the
overall statistical properties of the network, even though
this is happening every day. But then over a period of time, maybe if you consider
three years, four years, ten years, then maybe some
properties have changed. For example, these studies have
noticed that network density, assortativity, they vary and
the variation is complex as in, it’s not monotonic. Sometimes they increase of
some stages of the network, then they decrease, and so on. In fact, people have
studied behavior of users, then they have not
stopped there. Now, maybe some like Leskovec,
so Jure Leskovec at Stanford. That group,
they collect a lot of data. So now they have generated,
studying users, and they are now studying properties
of networks as a whole. So think of a social network,
a new social network comes into being, maybe it becomes
popular for some time, but after a while there’s a decay. For Facebook and Twitter, we are
still not seeing their decay. Though some newspaper reports
point out that Twitter is decaying, even then. Maybe for Facebook and Twitter
we are not seeing the decay. But maybe some of you
remember things like Orkut. At one point of time it was very
popular, now it has decayed. So properties of networks
change over time, if you take a long enough time. So maybe Orkut was
very dynamic once, but now it’s no longer dynamic. So different properties evolve,
and the properties themselves evolve
over time, not only the network. And there have been many
models to explain these kind of temporal variations. So I guess this is similar to
some of the questions we have heard in other talks. How does a society collectively
move toward something? Or how does something
stop in a society? So these kind of questions
have also been studied. Okay, Now, there have been
studies on different types of links, one of them I told you,
strong and weak links. So how to distinguish between or how to classify between
a strong link and a weak link? Suppose there is this
link between two users, how to understand whether
it’s strong or weak? So those kind of things
that we studied. Also, there have been
some kind of fun social networks which allow both
positive and negative links. That means these networks allow
you to indicate some people as friends and
some people as enemies. So those networks have
also been studied, but for some reason, they have
not become very popular. Probably because you do not want
to tag someone as an enemy. Right, so these networks have not
usually grown to that extent. And yes, so again, strength
of links vary with time. You can understand that in
the offline world also. Maybe in this workshop
the first day, how many people
did we talk with? Let us now, the strengths of
the links have grown over time, these things have
also been studied. Okay, this is another type of
question which is important, but how important is a node
in the network, right? So for example, suppose you
want to start a movement. You just want to start
a social movement. Then you have to get
some thought leaders, or people who are influential
in society, on your site. So who are the influentials
in a society? And again, say, when we do
web search, Google or Bing, we want important
webpages to be returned. So the webpages which
are returned at the top of the results, they should be
relevant to what the query is, plus they should be important. Right, so
it’s an important question, particularly important
question to ask, that which of the nodes or which users are
important in a social network? And there have been many
proposed measures or metrics on measuring
how popular it is. That, we are not go into much
detail, only let me just quickly say, so degree centrality
you can understand. If a node is of high degree,
it should be important. Like if someone has lots
of friends on Facebook, if someone has millions
of followers in Twitter, then that node or
that person is really important. But that’s only one sort of
centrality, degree centrality. There are others like say,
betweeness centrality, let me just quickly
explain using a figure. So if you see this network, look
at these two nodes, A and B. Their degree is quite low. I mean, there are many other
nodes in this network which have higher degree than A and B. However, A and B has very high, what is called
betweenness centrality. The way betweenness centrality
is measured is, Okay, I mean, simplistically,
A and B are bridging nodes. They are bridging
between two communities. So again, technically, if you
take all the shortest paths between all different
pairs of nodes, most of the shortest paths
pass through A and B. Because you take one node
here and one node there in that component,
find out the shortest path. It has to pass through A and B. Great, so that why A and B have
high betweenness centrality. They are between
many other nodes. All right, so
I have a question for you. Can you think of a situation
where, you think that nodes like A and B, which have
high betweenness centrality are more important than nodes
with high degree centrality? Let’s say, maybe, these are nodes with high degree
centrality, maybe high degree. Whereas A and B have high
betweenness centrality. Can you think of any
application where A and B would be more important?>>[INAUDIBLE]
>>Yes, so often,
these nodes are, kind of, they form the link
between two communities. So if some information was
born in this community, whether it will go over
to another community, depends upon nodes like this,
A and B. And they have very
important implications. For example, it has been seen that you
know some thing going viral. For any source or
video to go viral, it has to cross these blinks. Usually, every information is
born in a certain community. Whether it is adopted or
whether it becomes popular among different communities depends
upon maybe these users, who are forming the bridges
between communities. Okay, another application is,
suppose you are an advertiser. You’re a company which wants
to advertise your product. So you can target few
people in the network. Ideally, I want to
target everyone. But practically, that’s not
possible because you have to pay Facebook to target few people. Say your budget allows
only target to 100 people, and not 10,000, so which 100
people would you target? These are good candidates for
targeting. Because if you can influence
them, they can relay the information to two
communities, instead of one. Whereas targeting this person,
even if he is influenced by your product and he chose to
endorse it, maybe he can communicate the information
only to his community. Whereas these bridge nodes, they usually can reach out to
multiple communities, right. Okay so as I said, there are
different centrality metrics. You must have heard of PageRank. So very simplistically if I say,
so suppose there are two nodes which
have equal degree, so according to degrees centrality,
they are equally important. However, someone might say
that all links are not equal. Maybe someone has five friends,
so say two people who have five friends, but for one, those five
friends are heavily influential, whereas for the other user, the
five friends are his classmates. So both have degree five, but then one is much more
influential than the other. And so this is typically the
reason why, say, students take recommendations while applying
for grad school or somewhere. And recommendations are not
taken from classmates, they’re taken from
important people. Okay, so that’s why a link from
an important person is more important than a link from
an unimportant person. So page rank basically takes
those things into account, not only how many
links you have, but also with whom
are you linked to. Again, there are sociological
theories like a man or woman also is known by
the company he actually keeps. So, those kind of theories. Okay, so yes, another thing
is the community detection. So communities are, as I said, subgroups of the graph which
are densely connected. So for example, if I take this
summer workshop as an example, there are typically say five
communities here, I might say. That does not mean that there
is no interaction between people from one community and
the other. That does not mean that, but
definitely within a community, interaction is much more than
between communities, say. So intracommunity interaction is much more than
say intercommunity, right. And also there have been
different sociological theories on how communities
are not the same. Typically, there is a well
known distinction between bond-based community and, okay,
forgotten the term, wait. Okay, so bond-based community
means the community consists of personal social links,
like a family. Whereas the other thing, I’m
sorry I’ve forgotten the term, is basically people who are
interested in the same topic. So think about it. So people who have gone
to see a football match, they have lot of interactions,
but that is not because they
are personal friends. I mean, they are interested in
the same match or same topic. So there are differences
between different kinds of communities also. So this is just
an ideal picture of how communities in
a graph look like. There are these densely
connected components, and the rest of the network
is very sparse. And again,
this is an idealistic picture, practical picture is this. So defining communities is
also difficult, all right? Okay, so I’ve reached a slightly illogical break
point in the talk. So till now, all the problems
that I was telling, the issues or the properties, they
are all based on the network. So the statistical
properties of the networks, how popular a node is and so on. As I was telling you
in the beginning, social networks contain both
the network and the content. So the text content, the videos, the images
being uploaded and all. So next, I would go towards some problems
which use the content more. So let me tell you of
an interesting study. So By the way,
I’m sure none of you observed, but the title of my talk is
wrong if you go by the agenda The agenda says I’ll be talking
on social media analysis, but as I’m talking on social
network analysis. What’s the difference? I think you have heard both
these terms, social media and social network. Often the difference is subtle. For practical sort of point of
view, the difference is small. But usually when we talk of a
social network, we tend to think of a place where personal
interactions are going on. There are friends who are
gossiping with each other and so on, on such social network. Whereas, media comes in, it becomes more of an
information-centric description, a media,
like a newspaper or a TV. Its main purpose is not
social interactions, but to give some
information to people. So there was this very
influential study in 2010 in World Wide Web Conference,
which asked this question. This was a study from KAIST,
in Korea. Is Twitter a social network or
a social media? And there are lots of
statistical analysis. So when we say, social network,
we expect certain properties. Like there will be a lot of
fringles, there will become a new disrupture, it will be
a assortative, and so on. Whereas, when you say
of a social media, the expectation is different. Social media would mean, there are maybe few information
producers and lot of consumers. So think about it,
in the offline world, there are few newspapers but
there are many many consumers. Whereas in a social network,
everyone can consume, everyone can produce social
interactions are going on. Like friends gossiping, so friends gossiping among
themselves versus a newspaper. Whereas the content is produced
by only a few journalists maybe, I mean it’s consumed
by a lot of people. That is, these are the two
views, social network, and social media. Now that study in 2010 concluded that Twitter is more of a social
media, than a social network. Whereas Facebook, there have
been more studies after that. Facebook is more of
a social network. And if you think about it,
the very definition of a link in these two mediums kind
of classify that. In Facebook, you become a friend
to a person that person also becomes your friend. Whereas in Twitter,
you follow a person. The motivation is to get the tweets that that
person has posted. That person might not follow you
back, he might be a celebrity. So, the motivation of Twitter
is to give you information. Not may be social gossiping. So that is the kind of distinction between
social network and media. However, both exist
simultaneously and most of the assessed
studies use both. Before going into the studies, let me just tell
you of something. So we have heard
of some theories in different talks
of this workshop. So you might think for
explaining those theories you have, so for the earlier
talks in this workshop, the focus has been
on the content. Or as you might think that
the network is another channel of information. Let me give you
a couple of examples, we have heard of this
ingroup outgroup theory. Whether you are in group
with someone, or out group. So you have heard of things like
you usually change your language whether you want to be in
group or out group and so on. Another way of figuring
whether you are in group or out group is this kind
of community analysis. Maybe if you were in
a tightly clustered place, you have lots of common friends,
then you are in-group. So if the person whom you are
talking to, if you have lots of common friends and all, maybe
you are already in the in-group. So you will talk with him
differently than someone who is in a different community with
whom you do not have many common friends. That’s one, another way of maybe
figuring out whether someone is in group or out group. Another reason, another theory
which we have heard across, I mean, heard in different
talks in this workshop is the,
what’s that theory called? You try to accommodate the
>>Accommodation theory.>>Accommodation theory, yes. So suppose two persons
are talking, so this theory tells us, okay, both will possibly try
to accommodate to the other. But will it be equal? If A and B are talking, A will
try to accommodate to B style and B will try to
accommodate to A style. Will it be equal? Maybe not. It depends upon the social
influence, or the social, how famous that person is. We have heard things like
more famous a person, less polite he needs to be. Was talking about this, that
Christian [INAUDIBLE] study. So that means these linguistic
studies often depend on how popular you are in
the social network, which can be figured out using
centrality metrics, maybe. They give you centrality and
other things.>>What would be the kind
of the correlation between the popularity, the centrality
of your popularity and a power structure? Is that [INAUDIBLE]?>>So I think Christian
had another study. By the way, he was at MPI also. So luckily, I can refer to
him with his first name and not say okay.>>Jeez, you can say that.>>Yeah, and I had to, I had to. Yeah, okay, so
Christian had another study where he studied text returned
by judges in US courts. So judges in US courts
have a strict power hierarchy structure. There are some kind of head
judges or more senior judges. Whereas, there are less jury
members and lesser staff, say. Okay, that study was not
done using networks. That was a pure natural language
or linguistic-based study. But it was shown that the lesser
judges tried to accommodate to the style of the more
qualified judges, right, as the opposite is much lesser. So how much someone would try
to accommodate to another person often depends upon this power
hierarchy in the society. Another way of understanding
that would be the network or the social network. Okay, with that
interlude kind of thing, I would just go to some of
the topics which utilize the information
contained in the USLs. Again, it’s quite mixed. These studies usually use
both the information content, as well as the network. So selection recommendation. These are increasingly
becoming popular in online social networks. The one reason if you
think is that suppose, so these networks like Facebook and
Twitter, they have grown so much, that suppose an old
friend of yours joins Facebook. You have no way of knowing that,
unless Facebook tells you. So Facebook gives you this
recommendations, right, that, hey, this person
joined Facebook. Maybe you could like
to connect to him. Or so you are on Amazon. You are looking for
some product. Maybe there is an alternative
product which is cheaper and has equal quality. You have no way of knowing
that because of so many products in Amazon. So then you will
have to search or Amazon will have to
recommend to you. Hey, there is another product
which is kind of equal but cheaper, so you might take that. So as these networks have become
so large in size, search and recommendation are becoming
very important, from practical
applications also. Unless a social network gives
you recommendations today, it won’t be successful. It’ll simply not be popular. And all the basis of all
these search, social search, social recommendations is this, that friends are likely
to have similar tastes. If you think, this is
the same homophily principle. Birds of a feather flock
together kind of thing. So usual assumption is you and
your friends will have more or less common tastes. So let me just show
you a graphic, sorry. This was a kind of survey. Which of the following forms of
advertising is more effective at influencing you to
make a purchase? Banner, display ads,
celebrity endorsers, print ads, TV commercials,
friend recommendations? Because our friends
are people whom we trust. If they have recommended
something, possibly I’ll also like it because their tastes are
kind of similar to mine, right? So that’s the power of
friend recommendations. So many things can be
recommended, new friends, which groups you would like
to join, videos and so on. Let me give you this example, this is recommendation
of books in Amazon. I have many times changed
this slide to something more interesting, but it still remains a C
programing book anyway. I’ve been busy, sorry. It’s okay, so this is a standard
recommendation page on Amazon, focus on this line. Customers who bought this
item also bought these books. So these are being
recommended to you. You are checking out this book,
and these are being
recommended to you. The explanation is this, customers who bought this
item also bought these books. So what’s going on? So why is Amazon
recommending these books? [INAUDIBLE]
>>So the assumption here is that people who have
bought this item would also be interested in these items,
therefore, they must be similar. Now think about it, instead of
this, so Amazon has quite a bit of success, I mean, Amazon has
a good business model, I guess, they have lots of
recommendations. But think about it, instead
of this line, if I tell you, three of your friends
also bought this item. The impact would be much,
much higher. Customers who bought this item
also bought this is okay, but if I tell you that three
of your friends, and maybe I give the names,
also bought this book, then the impact would
be much higher. You are more likely to check
out that book also, and then you buy it.>>[INAUDIBLE]
>>A study based on?>>[INAUDIBLE]
>>Okay, not for Amazon, I’m not aware a that, because these companies
don’t publish much papers. But in social networks and
all, the impact of social recommendations are well
studied, and I can’t remember
the exact numbers, but,>>[INAUDIBLE]>>I’m not aware of a study, because usually those data
are such that they cannot be published, things like this. So I’m not aware of a study,
but it’s a common notion, that is why most of the social
networks are shifting towards social recommendations. So usually the recommendations
have some broad categories, it’s what collaborative
filtering is based on. Figure out other users who have
possibly similar tastes like yours, that’s
collaborative filtering. And another genre of
recommendation is content-based, like you have bought this item, maybe these are the items
are similar to that item, therefore they’ll
recommend to you. So collaborative filtering
is going a bit out of that, I shouldn’t say that. The thing is, social networks
have become so large that in case of collaborative filtering,
as if you have to do an analysis on the whole matrix of
the network, and all the users. That is no longer possible
because the networks are so large, so only focus on
the friends of the user, maybe two hops or one hop. That is a quick way, but it gives very powerful
recommendations, yes.>>So I don’t think I’ve ever
bought anything from what’s recommended to meet
my [INAUDIBLE]. I’m not sure whether
I’m a non-person, or that’s just the kind of thing
I [INAUDIBLE] because they have great service, and they sell
everything I [INAUDIBLE], but I don’t think I’ve
ever bought anything [INAUDIBLE]
>>So I had just the opposite experience, So when I saw
the books, I go to each and every book that I [INAUDIBLE]. Especially in computer science,
which I study the most. It’s a little more technical
than [INAUDIBLE] so I find it a lot of [INAUDIBLE]
interesting [INAUDIBLE] and they did with this. I know you probably be
wondering [INAUDIBLE].>>I’m in the middle, I cannot
buy a book recommended by Amazon cuz I find really
weird recommendations. But I buy a lot of points
[INAUDIBLE] because I really have no way of knowing how
many 60,000 toys there are in this world. So if I know that this
Lego Nexo Knight that my son is interested in, and
then Amazon recommended, let’s see other kinds
of Lego Nexo things. I mean, [INAUDIBLE] how
the people buy [INAUDIBLE].>>So in one way, Amazon doesn’t
have the social network, it’s not maintaining any
kind of social network. So the only source of
information Amazon has is our search history, what things
we have been searching, and these kind of things that common
customers have bought this item. But things like Facebook and all mediums will have
the social network. And they’re increasingly moving
towards utilizing the social network for recommendation.>>[INAUDIBLE] books
I have of different [INAUDIBLE] which I trust. I’m pretty plugged in
to it [INAUDIBLE]. But [INAUDIBLE] I don’t
have a [INAUDIBLE] But like what they needed
like this sorts.>>Also I think it depends
on the types of the books, I don’t think satirical,
I don’t know, it was like information for
fiction.>>For fiction, though, it just
doesn’t work, even otherwise, I think Goodreads has
better recommendations. So there are forums and
Places that you->>Well, [INAUDIBLE] and->>Books, yeah, so that’s why I think it’s a better
network for certain things. But for certain things,
I don’t have [INAUDIBLE].>>But sometimes there’s also
[INAUDIBLE] books on Indian classical music,
I hardly find [INAUDIBLE], but Amazon have very [INAUDIBLE].>>Maybe,
it might be very specific, but normally I find Amazon book
recommendations very weird.>>And so the utility of
their recommendation. [INAUDIBLE] So, like he said, there’s different
kinds of people. Some people don’t know
what they want, but they allow recommendations. And some people know exactly
what they want to buy. So they don’t buy [INAUDIBLE]. I know some people
>>Yes, so I think what most is
maybe better than here that if you have a strong
social network that provides you that information or
that recommendation, then you probably are not
going to use this. And if you know-
>>[CROSSTALK].>>Yeah.
>>[INAUDIBLE] friends [INAUDIBLE] those details.>>Yes, kind of. Let’s take a look at.>>That’s scary because
knows who your friends are.>>Facebook is.>>Facebook.
>>So you.>>It works easy enough to
create separate accounts. You still log in with Google and
Facebook or for that.>>And then he just given up.>>[LAUGH]
>>But it is possible that you trust
that you place with a link, with a known friend and
with someone else.>>[INAUDIBLE]
>>[INAUDIBLE]>>Yeah, of course, of course. Okay, I have an interesting
story to share about recommendations, but
I’ll postpone it for some time. Okay, so search and
recommendation, another important thing to study on
social networks is the spread or diffusion of information. Here are some of the questions
which usually are interesting. So this thing is
quite difficult. How does a topic or
video become viral? There are have been some studies
which have tried to figure this out. But then a large part of
the conclusion has been, it’s difficult to predict. If something takes on the
imagination of the population and then it becomes viral, it’s
not always socially explainable why is it something
becomes viral. And definitely the content
does not explain it. So there are as many or better
items, maybe movies, songs which do not become viral compared to
something which becomes viral. Okay, this is another
picture of how a viral image spread
in Facebook, right? So, okay, okay, this thing I’ve
already talked about earlier, but I’m going to look
at this in a new light. Identifying influential users. So we have talked about
centrality measures and all. And there have been several
centrality measures used on social
networks like Twitter. However, how if I tell you I
want influential people in politics? Then the graph alone
will not tell you. Sure, quality shares have lots
of links that maybe centrality’s pretty high and so on. But how to they tell you that
this person is a spot on politics? That only the graph
won’t tell you. For that,
you need the content also. So there had been several
studies which tried to identify topical experts. There have been studies
from MSR like [INAUDIBLE] I think he’s in red one,
I’m not sure. He has one study, and we had
one study where we identified topical experts on Twitter. So there we used a nice
combination of content and networks. So usually what people
did earlier is that, so suppose we have a user. So we have some content for him
and the tweets he has posted, what information he has
in his profile and so on. The past studies tried to infer
whether he’s an expert on this topic from this content
that we see he has posted or what he has put in
the profile and all. But typically, tweets are noisy. People have very
incomplete profiles. Therefore, these methods
were successful to a degree. We use the social way of
figuring out topical experts. There was a feature
Twitter Lists, by which you can
create a list called, say, politicians And
add other members to the list. Say you create a list call
politicians and you add say, Barack Obama, Trump, Clinton,
these people to the list, okay? That gives us a social way
of understanding many other people not telling that you
are an expert on politics because they have created lists
called politics, politicians, and so on, and
have added you to the list. That means many other
people are willing to hear what you are posting
on politics. So you must be
an expert on politics. That was our way of
figuring out this. So this we have published
in the SIGIR 2012. As of 2012, this we build
a small system on Twitter for topical experts. As of 2012, that system performed better
than Twitter search engine for a large number of topics as
judged by human volunteers. And so at that point of time, Twitter did not use
the lists featured. So when I was presenting
this work in SIGIR, the Twitter search head was
there among the audience. So he came up and
talked to me that, okay, this is a nice thing and
we have not explored this yet. But in a few months, Twitter
started using lists also, and now their search engine is
much better, of course. Okay, so again, so the key
question is here, so what the graph can tell you whether
a person is an influential or a popular person. For topics specific expertise
if you want to evaluate, then you’ll have to take
the result to be content also. And that’s what I was saying, many studies use both
the network and the content. Another very well known field
of research is the emotion or opinion mining. And it has very
many applications, whether a movie would
be successful or not. So trying to get
the public opinion automatically from
the social network. Again, because it is
a traditional way to conduct surveys, surveys often give you
a lot of detailed information. But they are expensive to
conduct, therefore more and more companies are going towards
understanding opinion about their products from social
networks automatically. And even Twitter has been used
to predict election results. So there have been
both types of papers, papers which try to predict
election results using Twitter data and
claim that they’re successful. And there have also been papers
which have simply trashed this. At no point trying this,
this is not possible. Same goes for stock market. There have been papers which
have tried to correlate stock market with Twitter,
mood on Twitter, and so on. And there have been some other
papers which have said just stop them, it’s not correlated. Okay, so there is a lot of
opinion among these opinion mining studies also. Okay, this I guess, I mean,
possibly the most popular study on social network
since our spam detection. So as these social
networks become more and more popular, more and
more spam comes in. And spammers have
different techniques. For example, Sybils, anyone is
familiar with the term Sybils? So Sybils are when one spammer
creates a lot of accounts. So typically, a lot of accounts
controlled by a single spammer, those are called Sybil accounts. So Sybil accounts, so normal is normal spam has
been kind of well, I mean, there are good classifiers
in all normal spam accounts. But then spammers have
shifted to Sybil accounts, creating Sybil accounts and all. So a large challenge for today’s social networks is
to stop Sybil accounts. And if you have observed
over the years, creating an account
has become difficult. At one point, anyone maybe could
create an account on Facebook. Then they started linking it
with an email ID. Then they started linking it
with a phone number. They will send you a notify and
all. A lot of these efforts
are to stop Sybil accounts. So that 1% cannot create
a whole army of accounts. Okay, and by the way,
Sybil accounts are used for very strange purposes. So before the last US election, there were reports that
a certain politician, not Trump, but a certain politician had,
okay, same Tweet in US. Twitter is so popular, that
even in election propaganda, candidates go and say, I have so
many followers in Twitter. That’s why I am more popular,
vote for me. So many people listen
to what I say. So there was this report that
one candidate had actually bought a large number of
followers in Twitter. So basically,
he bought Sybil accounts. So there are Marketplaces where
you can buy you can go and buy similar clouds.>>So
I know some studies in China, have there been
studies in India?>>About what?>>Quality shares using media.>>I’m not aware.>>That-
>>But so [CROSSTALK] use of social
media in Indian politics has just started,
maybe last couple of years. So->>No, I saw articles in newspapers
were quality data.>>Okay,
I’m not aware of any studies. It would be nice to do it, yeah.>>[INAUDIBLE].>>But in China,
[INAUDIBLE] what China thinks. So most of the people are from
outside and [INAUDIBLE].>>Thank you. Staying in this-
>>[INAUDIBLE].>>But I mean China and
the United States. China [INAUDIBLE] one
[INAUDIBLE] thing in the [INAUDIBLE]. You get attacked if you,
I couldn’t imagine. Yeah, I mean,
look at the [INAUDIBLE]. [INAUDIBLE].>>Okay. So the other utility
of social networks? On which our projector’s
also based is the Twitter, especially, has become very
important sources of real time information and events. So here is a comic strip. So it says that when
there’s an earthquake, what will reach you first? You will first feel the tremors
of the earthquake, or you will first get a tweet
saying there’s an earthquake. Okay, so that’s the comic strip. The conclusion is finally
that sadly Twitter users’ first instinct is
not to find shelter, but to tweet that there’s
an earthquake. Okay, so this comic strip, at least I came to
know of it in 2010. So hopefully it was
published then. So it was a nice comic strip. People laughed about it and all. And in a few months,
there was this paper. In World Wide Web 2010,
which showed that this is no longer a comic strip but
this is the reality. So this study was done
by a group from Japan. Japan has lots of
small earthquakes, and they have lots of text savvy
users with smartphones and Twitter accounts a lot.. So they created a real-time
event detection by social sensors. Typically means
users on Twitter. So they have this simple system. They were looking for
the word earthquake or the Japanese version of it. If they suddenly see
a large spike in that word. Now, I mean temporal, I mean
you go on streaming data from Twitter and check how many times
the word earthquake is coming. Say by minute,
by five minutes, and so on. If there’s a certain spike
in that use of that word, there must be an earthquake. They used a very simple system
to detect earthquakes based on that.>>[INAUDIBLE].>>Maybe not, yeah, but at
least that is more scientific, so therefore it’s possible.>>[INAUDIBLE].>>But yeah, one thing, when there was this large
earthquake in 2012 or 2011.>>[INAUDIBLE].>>Yeah. So actually-
>>[INAUDIBLE].>>Yes, so this study did not
comment anything about really large earthquakes. But they said that moderate
level earthquakes, which human beings can understand, but
they do not cause much damage. For those, at least they claim that they
can detect the earthquake earlier than the Japan
meteorological agency, okay. Of course, the Japan
meteorological agency is more responsible for this. So they have to really figure
out that actually there has been an earthquake. Maybe they don’t publish before
they figure out the Richter scale [INAUDIBLE] at all. But Twitter users [INAUDIBLE]
earthquake, I mean->>[INAUDIBLE]. In the middle of the night
[INAUDIBLE] fell. I was fast asleep, but
suddenly [INAUDIBLE]. And I open my eyes and
I [INAUDIBLE] the entire [INAUDIBLE]
like this.>>Why not?
>>And I go like this and I just shut my eyes and
sat there and I said look no,
I’m not dreaming. Then I woke up and
I walked out of my room and everything was quiet. Nobody say anything,
no noise or anything at all. Exactly, that I
wanted to hear so much to see my [INAUDIBLE]
was I opened Twitter. And sure enough I found
like these 20 tweets. Right, there and then saying that earthquake
in that area etc., etc. And I was able to confirm
that I wasn’t sleeping and dreaming the whole thing. So, there was no
other information. The hotel staff,
nobody was like [INAUDIBLE].>>[INAUDIBLE].>>I guess it was a very,
>>Mild>>Mild. I mean, the thing moved. The whole
>>Yeah.>>the thing swayed.>>Everything but
it must have been fine.>>Maybe in Japan people
have got used to that level of
>>That level.>>Yeah, yeah.>>[INAUDIBLE].>>Anyway, so
>>Yeah, so this was one study which
actually opened people’s mind on how this level of real time
information is available. Then that started a lot of
studies on actually using Twitter for
real time information, so. And so as that happens, there
are more coming up, for example. Sub-events. So, often an event is a large
connection of different sub-events. So, for example, when a natural
disaster happens in India, one of the sub-events is that
when some prime minister goes there,
that creates more news. People get disturbed that
they are actually hampering the relief operations
by going there. So those kinds of subevents
are often there are aftershocks. So during the Nepal earthquake, we know there are several
aftershocks and the effects that have on
the [INAUDIBLE] operations. So these are subevents which
need to be figured out. Even detecting the subevents
is a challenge. And then usually the data
is coming so fast and time is critical. So no one has time to
read everything, so summarized information. That is a very strong
idea of research. The Qatar Computing
Research Institute, who do some large work
on crisis informatics. They have a lot of
focus on generating good summaries
during these events. Then, again, I’ve told you
about influential users in a social network. I’ve told you about topic
specific influential users. Similarly, there are even
specific influential users. Maybe these users are not
influential in a normal way of looking. They don’t have lot of
followers or anything. They are not socially
very popular. But then, at that point, that person is important because
maybe he has some resource. We have some knowledge
which others can use. So, I can tell you
just one anecdote. So, on the night when U.S.
was conducting the raids on Osama Bin Laden’s hideout,
it was Abbottabad. Yeah, the name of
the place was Abbottabad. There was apparently one user
in Abadabad who was tweeting. He had no knowledge
of what was going on, but he was simply tweeting,
I can see lots of helicopters, a lot of activity here,
and so on. And it is said that that night,
he was the only source. Not journalists there, nothing. So that night, he got more
than 3,000 followers. On Twitter, because he was this
event-specific influential user. He was the only one who was telling something about
what is going on.>>[INAUDIBLE] that he
was when [INAUDIBLE]>>Pardon?>>[INAUDIBLE] When
he was only the only one also away [INAUDIBLE]
>>Yeah, exactly, yes, yes. So this people are usually
called community leaders. So there have been studies on
identifying community leaders during. And finally, I have put this in because
everyone asks me including. So okay,
we are using Twitter for. Who will say whether the
information is genuine or not? That’s a very
difficult question. And it’s so difficult that I
have left it aside for now. I’ll just show you
a couple of tweets. So I think most of here
are aware of the floods in 2015. So has a crocodile bank. And there was a rumor That floodwaters have breached the
walls of the crocodile bank, and 40 crocodiles are going
around the city now. I mean, people just stopped-
>>[INAUDIBLE]>>Yeah, I mean, people just stopped short of clicking selfies with
crocodiles and posting. And no one did that, but
it was almost like that. So I’m going to show
you a couple of Tweets. These are actual Tweets
taken from during that. I think very well-written Tweets
with lots of details on the road [INAUDIBLE] [LAUGH]
>>I’m sure I can [INAUDIBLE] .>>So these are two
[INAUDIBLE] rumors, and it’s very difficult to figure
out that these are not credible. Very well formed with
lots of details and all. So there is a recent study in
world wide web which says that linguistically it’s very
difficult to figure out that these are not credible. But the social network itself
can sometimes help you. If there is a rumor there
will be some people who are questioning it or
at least denying it. That can be taken as a signal
to at least understand that this might be a rumor. So, I mean true we found
these tweets also. Okay. Now, I do not know. Linguistically, it might be very
difficult to build a classifier on these two tweets and
the blue ones. I mean-
>>[INAUDIBLE]>>Exactly. Or these are false,
these are true. It might be difficult to-
>>[INAUDIBLE]>>[INAUDIBLE]>>Well, but which one would you think more credible?>>That’s important.>>
Yeah yeah yeah. Yeah. No no no, okay, you can differentiate but
which one is more correct? That is ->>[INAUDIBLE]
>>Yeah exactly, exactly.>>I know today, but at that
time, I might [INAUDIBLE].>>Yes, exactly. I think that is the problem. That’s why rumor
detection is much more difficult than spam detection. Spam has a unique
way of posting, and the people have very
different characteristics. Whereas, in the case of rumor, perfectly normal maybe
>>People who are quite credible and trustworthy, they are posting this because
at that point of anxiety, whatever someone hears, they
just forward without verifying. So network characteristics and
other language characteristics usually have find it
difficult to distinguish. So whereas spam detection is a
well kind of concurred problem, rumor detection is
very difficult. Okay, so
till now was the introduction. Now seriously, are you bored? Because after this I have
a couple of studies that we have done. I can briefly introduce you to that or-
>>Yeah.>>Go on?
Okay. So I’ll just briefly
describe two studies. One which was primarily done
on the social network, and the other which was done
mainly on the content. So first is this, the name of the link farming
in the Twitter social network. So what is link farming,
first just let me introduce. So as I said in the web, say in the web, there are lots
of search engines, like Google, Bing and all
>>So apart from relevance, they’re supposed
to give a query, they have to give you
relevant results, apart from that, they have to
give you important results also. So how to measure
the importance of a website? Or how to measure the importance
of a node in the web graph? And as I said, there are centrality matrix
like page rank and so on. Now, spammers Have also
developed their own methods to beat these methods. So is one method. Another method is
called link farming. Typically, the same spammer
creates multiple accounts, and they link to each other. So say each account,
or each node now, has lots of other links
Therefore, these algorithms like page rank get full this
note is also important. Because typically, if say 100 pages are linking
to your website, maybe these algorithms would think that
your website is important. Now, there are 100 webpages that
actually created by you So, they are linking
to your account. So, that’s link farming. So, there are nodes
which farm links. Wait. Now again there have been
methods which have been found to be quite successful in
detecting link farms. And in the web,
link farming is, more or less, a conquered problem. Right, otherwise
the result would not be so good, like what we see today. So link farming is more or less, sure the challenges exist
but it’s more or less none. But then, no one has studied
whether link farming is going on in, social networks
like Twitter. So we know there is lot of spam,
but is link farming going on, So,
that’s why we did this study and let me tell you a secret
when we started this study. We had no idea that
we would be end up. We would be ending up
looking at link 5. We started up as
a normal projector and. Okay! So, we started by
identifying a lots of. I’m not going into details
basically we look at account which have been
suspended by Twitter. And we had posted
blacklisted URLs. We needed the second
step [INAUDIBLE]. So Twitter suspends
spam accounts if people report some account as spam. Twitter suspends that. But we needed this second
verification step also, because Twitter is said to
suspend some inactive accounts. So the account can be suspended
because of spam activity or inactivity. So to ensure that we are
actually studying spam accounts, we saw that they had
posted blacklisted URLs. URL’s are blacklisted
by some services. So very malicious URLs. Okay, so we had identified about
40,000 spammers in Twitter, and we we’re studying
their network properties. This is what first surprised us,
that when we take a random node in the Twitter network,
a random user, the average in-degree is 36, whereas average
in-degree for a spammer is 234. Much higher. This is not something
we would really expect. We won’t expect, so that means many other people
are following spammers. This is not something we are
expecting, but still it’s okay. Maybe the others
are spammers also. Link farming is going on. Spammers have created many
accounts which are following their own accounts. So that was our hunch. So as we saw, a relatively small
set of users, few thousands, follow most of these spammers. So these are the people
whom we call link farmers. They are not just
spammers per se, but people who
are following spammers. So the natural question
we ask is are the link farmers themselves spammers? Otherwise, why would
they follow spammers? But then the answer
surprised us. We saw that 80% of them
are actually real, popular, and active users. And it also includes
verified accounts. You would be shocked if I
tell you some of the names. Let me show you. There was Barack Obama. There was Britney Spears. There was NPR politics. UK Prime Minister,
JetBlue Airways. All these accounts
are actually verified, and they are following
thousands of spammers. So, let’s understand this. In general, in a social network,
people expect that node A will link to node B if
A thinks B is good in some way. So a link is actually
a type of vote. That is why if you’re website
has links from ten other websites, it’s as if
you have ten votes. So you must be slightly more
important than maybe someone who has just two votes. So usually,
a link is considered as a vote. And if our link comes
from an important person, then that vote is
much more credible. Remember that I was taking
the example of taking recommendations from
faculty members. So students take
recommendations from faculty members because that’s
the kind of vote, okay, I mean, maybe an endorsement
kind of thing. And it’s coming from
an important person, therefore, it’s much more credible. So you’d expect good people to
only link to other good people. I mean, all these network
algorithms like PageRank and all, go with this assumption. That if many good
people are linking, if many good nodes
are linking to another node, then that other
node is also good. But in Twitter,
this seemed absolutely invalid. There are all these
verified accounts which were linking to spammers. And okay, and so what we found
also is that apart from these verified accounts,
verified accounts are few. But the majority of these people
who are following spammers, they were marketers trying
to promote their business. If you look at the other column,
maybe this, you would understand. They are into
affiliate marketing, interested in tech,
social media manager and so on, they are kind of
promoting their business. And that explains
their behavior. So why are popular
users farming links? Or why are popular users
connecting to spammers? The answer is social etiquette. You follow me, I’ll follow you. You give me your business card,
I’ll give you my business card. So we learn Twitter, it’s if
you follow me, I follow you. And if I go back to
the previous slide, you will see some
of them tell that. We’ll follow back. Let’s talk soon. So they are telling into
their profile, follow me, I’ll follow back. So, What is the problem? These people are not
doing anything illegal. I mean, it’s completely within
Twitter terms to follow back. But the spammers are taking
advantage of them. When a spammer creates
an account, they are simply following some of these users
like Barack Obama and all. And they are getting back links. What’s happening is that
the spammer accounts are getting huge values of PageRank and all. Because it’s very easy for a spammer account to get a link
from Barack Obama, it’s easy for a spammer account to get
a link from Britney Spears. This is never
the case in the web. You would not imagine spam,
say you would not imagine say, Google site or Microsoft
site to link to a spam site. You would not imagine that. But in social network we
found that Barack Obama, Britney Spears and all, they
habitual are linked to spammers because of this
social etiquette. So at that point, Barack Obama
was in his presidential campaign days, so he used to follow back. I mean, maybe he was not
the one handling the account. But then it was part of the
presidential campaign policy, anyone follows me on Twitter,
I’ll follow back. Because I’m also
hearing your feedback. He had to give out that vibe,
so he followed back. So and actually, so we created
on one account in Twitter. And we followed some
of these link farmers. So within nine days, our account
reached to within top 2% of Twitter users,
according to PageRank. Right, so
the problem is that, so typically how does a social
network deal with spammers? By suspending them. But this study showed
that detecting and suspending spammers
are not enough. Because it’s easy for the
spammer to create a new account, link to some people, and again
rise in the rankings quickly. It takes about a week to come
up to Twitter, up 2% users, according to PageRank. So that leads to ultimately
increased spam in Twitter search results, because it’s
easy to game the system. So Twitter search at
that point used PageRank. So it’s very easy to game the
system just by creating a new account and following some
of these link farmers. Okay, so yeah, the challenges
was this, that real and popular users that
are engaged in link farming. Therefore, detecting and
suspending spammers won’t help enough cuz you can’t suspend
these real popular users. Because they’re not
doing anything illegal. Following back is not illegal. So therefore,
we thought that okay, we have to design something
to discourage users from following
others carelessly. And so we did something. We just kind of, okay so
that is what we did. We said that we will penalize
your score, so every node has a score according to this
PageRank type algorithms. We will penalize your score if
you are following spammers. So we proposed this algorithm
called CullosionRank. I won’t go into details. This is reverse of PageRank. So PageRank says that you
are good if many good people are following you. We said, you will be penalized if you
are following many bad people. So PageRank depends upon
who all are following you. You are good if many good
people are following you. We looked at the opposite side,
whom you are following. So we said, we’ll penalize you
if you are following bad people. Right, and then we added up,
PageRank and CollusionRank. So I’m not going into
details of the algorithm, but this is just the result. So you see, the x-axis is
the node rank in percentile. And this y-axis says fraction
within a set of users. So this red line is for
PageRank. It shows that if you just run
PageRank on Twitter network, in the top maybe 10% of the nodes,
there are 20% of the spammers. So 20% of the spammers
whom we could identify. They are within the top
10% according to PageRank. Right, whereas the way
we used PageRank and CollusionRank, it brought
it down to so low. There would still be some users,
which is fine, because Barack Obama
should still rank up here. Just because he’s
following spammers, we should not push him
down the rankings. So some people it’s
okay to find here. But a large fraction of the
spammers we could push down to absolute low rankings, so
the last 10% right here. So this was the study on
link farming in Twitter and social network. To our knowledge, that was the first study on link
farming on a social network. There have been a lot of studies
on link farming in web, but not on a social network. And we found that
enemies very different. Because in web, there is no
concept of following back, but in social network, there is,
and that creates this issue. Okay, and the last study which
I will just present here, is the one I was talking
about yesterday. This is a very
recent study just, I mean,
presented last month in ICWSM. So here we are not using
the network at all. But here we’re
using the content. Here we are focusing on
trending topics in Twitter. So trending topics, these are
examples of trending topics on a single day,
some time in April, 2017. So these are example
crowdsourced recommendations. So if something like, if some word is suddenly being
used highly by a large fraction of the people, then Twitter
declares that as trending. Of course,
their algorithm is more complex. I’m not going into details. And frankly, they won’t even divulge all
the details of their algorithm. But in general, it’s that if
some topic or some hashtag, say, has a sudden spike in its use,
then it’s declared as trending. So there have been a lot of
studies on characterizing the trending topics, like say, are most trending
topics from politics? Or from sports or what. There will be studies like this. In our study, we did not focus
on the topics themselves, rather we focused on the people
who make these trending. So typically, a hashtag becomes
trending because a large set of people have posted it. Which is this large set? Right. So basically we have analyzed
the demographics of the crowds who promote Twitter trends. So one intuition can be that
the whole general population is posting our topic. That is why it is
becoming trending. Or a large fraction of
Twitter is posting a topic. That is why we become trending. That is the intuition,
but let’s see. So when we see
demographic attributes, we mean these, gender,
race, and age. We have these categories and of course, we have this
talk from Tom yesterday, that author attribution
is a difficult problem. From a post of the author, to figure out these is
a very difficult problem, especially for Twitter it was
nice a vocabulary and all. So we did some
literature survey and found that text-based
methods are really difficult. Therefore, it didn’t
go towards that way. We use the profile images. So there is this tool called
Face++, which from a profile image, it can tell you these
characteristics of people. Of course there is
a lot of limitation. Often people have
landscape as the profile. Then we don’t get anything. Often people have a group
photo as the profile, then we don’t get anything. Still so we did some
internal experiments and we found that it’s more or
less accurate. So at least when the profile
image has just one picture, it more or
less gives good results. Right. So, but typically, so
we could identify this kind of demographics for
about 55% of the Twitter users, which is not very high, but
well acceptable, I guess. Yes?>>Like when I was
ten years old.>>Yeah, true sure,
all those things are there. So, yes.>>[INAUDIBLE]
>>I but in that case.>>No, no, no, I’m just mine
is essential one, first.>>And by the way, so we also absorb one thing that
among these three attributes age is the most difficult
to figure out. From a face, age is the most
difficult to figure out, even for this tool.>>So there was a Microsoft tool
some time back, what’s your age.>>Okay.>>So
it would look at your picture, you could upload
whatever picture and it would track your details,
whether you’re male or female.>>I see.>>And what age you are.>>Okay.>>It wasn’t that accurate,
because->>It has no accuracy.>>And sometimes it would
also get the gender wrong.>>Okay.>>No human would get the gender
wrong, if they saw the picture.>>Yes, yeah.>>So this people quota
here are implement this?>>I didn’t get the question.>>So in Facebook there’s a item
point providing that all of them material like so.>>Okay, I don’t think so,
in case I don’t know. It just has a failed in
the profile, which you have.>>[INAUDIBLE]
>>Yes, and many people don’t populate it even, but
as far as I know only one. Okay, but
we found that this tool is okay, we could do the study with this,
okay. So now for our trend, we distinguish
between two sets of people. One is called the promoter, who posted on a topic before
it became trending, and adopter, who posted on a topic
after it became trending. This was what Calica was
talking about yesterday. So, promoters, these are
relatively smaller set of people who are posting on a topic
before it began trending. After it began trending, many other people began
posting about it. So the set of adopters
is usually much larger than promoters, but
they were compared. So just to show what we
are looking at is this so for example, there is
this hashtag for our trending topic
called #wednesdaywisdom. We found that the among the
promoters who are promoting this topic, 48% are male and
52% are female. Whereas, for some,
another topic like Wikileaks, 76% are male and 24% are female. So you see not all trending
topics are created equal. There is a huge distinction or
difference, between the demographics of
population who are promoting it. This is about gender. What about their race? So we had these categories,
Black, Asian and White. Support this trending topic,
Comey, the distinction looks like this. By the way, one thing I must
say, which I’m not going to so much detail in
this presentation. But when you see this one
natural question to ask is, is it interesting? When would it not interesting?>>[INAUDIBLE]
>>So how would it be obvious?>>When the person thinks
it’s the same [INAUDIBLE].>>Exactly,
if the population is like this, then it’s not surprising
that it’s like this, right? So we were taking
that into account. We have a global distribution
of Twitter population, and we have a distribution for
each trending topic. We were measuring
the divergence. So even in Twitter population,
whites is a large fraction. So it doesn’t look,
I mean, it’s not so different from Twitter
population as it seems to be. Because in Twitter
population also, I think, close to 70% would be white. Okay, but yeah, so for
different trending topics, there are distinctions, statistically significant
distinctions, which we measured. Okay, so now we had these
research questions. I don’t go to all of them, I’ll just give you very
brief answers to a couple. So, how different are the trend
promoters from Twitter’s overall population? So for some trends it’s
very similar, but for other trends it’s
much more different. Are there certain socially
salient groups who are underrepresented
among the promoters? I’m not going to,
in this question. Please talk to me,
if you want more details. This concept of
underrepresentation has a significance in US. They have these laws called
80% laws, which says that, it’s used like things like this. Suppose a company’s hiring. And is hiring males,
and females. Someone might say that you are
hiring more males than females. Then the company might say,
we get more male applications, that’s why we
are hiring more males. What’s wrong in it? So to get pass this debates,
the US has these 80% laws. It says that, if a certain group
of people is less than 80% than other groups, then some days,
that group is underrepresented. So there are these
semi-legal kind of rules. Some not were into details, that
is what we were checking there. And the brief answer is yes, we
found some demographic groups. Like say, black women,
black middle-aged women are the most underrepresented
group in Twitter. So very few trends are dominated
by black middle-aged women. And then, this is the question
which I was talking about, through promoters and
adopters of a trend. People who have posted before
it became trending and after are different? And finally, what can promoter
demographics tell about the type of Tweet trend? So just one slide answers
to some of these questions. How different are the trend
promoters from Twitter’s overall population? They’re usually different. So far, 61% of trends, the gender distribution is
statistically significantly different from the Twitter
overall population. For 80% of the trends,
the raised distribution is statistically significantly
different. So, a brief [INAUDIBLE] is yes. For a very large
fraction of the Twitter. So, very few Twitter trends are
produced by a random population of the Twitter population. A random sample of
the Twitter population. Usually almost, I mean, 80%,
60% trends are produced or promoted by a certain
demographic group. Which is very different from
the overall Twitter population.>>How do you
define promoted by?>>Anyone who has
posted on that topic before that topic
became trending.>>And how do you define when
a topic becomes trending?>>No, so we go on streaming,
trending topics from Twitter.>>Twitter is definition.>>Yeah, of course.>>Can you give us some
statistical evidence?>>Yes, they say that they go
for acceleration kind of thing, so in the last delta time, how
much has the usage increased? They don’t tell us
the actual algorithm. But you can stream Twitter
trending topics at certain points. So Twitter trending topics that,
I think, are given at 15-minute
intervals, so you will exactly know when that
topic first became trending. So all people who have posted
before that are the promoters, and all people who post after
that instance are adopters. There can be common.>>These are the hashtags, so-
>>Not necessarily. Trending topics,
90% of them are hashtags. But some other things
can also come. Usually some names of person
like Narendra Modi and all, these sometimes trend or Trump. Okay, So I’m skipping
the second question. The third question is what we
were discussing yesterday, do promoters and adopters of a trend have
different demographics? Answer is yes. So here the orange bars
give the divergence of the promoter population from
Twitter overall population. So, the Y-axis is divergence
simply the high divergence means much more different
from greater population. So, the orange bars here show
the promoter demographics. They are much more different
from greater population. And, the blue, or green bars, they show the adopter
demographics. So the divergence
comes down usually. So usually topics become popular
in small demographic groups which are very different from
the Twitter overall population. But once they become
trending everyone starts posting about it. So the adopted demographics
is much closer to the overall population. Yes.>>I know it’s [INAUDIBLE] but
I think you need specific evidence
that [INAUDIBLE]. So the same instance for generality in the [INAUDIBLE]
too big, and so [INAUDIBLE] main side of the
>>Okay, so the divergence measure
that is being used is not one which gives negative scores,
if you mean like that.>>It can not give you
negative [INAUDIBLE].>>Okay, I can not answer that
offhand but at least I don’t remember seeing such growth
differences that before it was dominated by women, after that,
it became dominated by men. I didn’t see that much
growth distinction, sure, more people of
the other side took it up. But I don’t remember
examples where the thing has completely overturned.>>So it’s more like a dilution?>>Dilution, exactly. A dilution.
Demographics becoming more homogeneous with
the overall population. That would be
really interesting, but I’m sure everyone
would have leap up and down had they seen
something like this. I don’t remember seeing
anything like that. Okay, so the last question
which I’ll just talk about is, it’s okay, we’re doing this
promoter demographics thing. So suppose we find
that something is nominated by a niche community,
what does that usually tell us? So, I mean pretty
intuitive answers. So some of the, many of the
trends express niche interests of their promoter groups. So things typical thought process we also had this thought
process before doing the study. Is that something that’s
becoming trending on Twitter? Must be universally popular but
that’s not true. Often the things which become
trending were initially popular in a small demographic. And also, the interesting
thing is there are some events which are controversial
in nature. There, different demographics
start promoting different hashtags, and
any of them can become trending. So the particular event
that we study here is that of a shooting incident
in Dallas, in US last year. So there, that particular
situation was that on a certain day the police killed
few black people suspecting that they
were terrorists. Something like that. Then the black people
organized a rally, a peaceful rally, where to
protest against these killings. But on that day, a black army veteran kills
a lot of the policemen. So that week was
extremely disturbed. So there were a lot of
black people saying things like black lives matter. Whereas many other communities
were posting things like All Lives Matter or Police Lives
Matter, and things like that. Because it was, I mean, as if both communities were trying
to kill each other or something. So during those events,
things become more difficult. So okay, so
here’s is just a comparison. So on the right side, you see
the overall Twitter population. Yes, 68% are White, 14%
are Black, and 18% are Asian. It’s heavily dominated by US,
that’s why it’s this. Whereas, this hashtag, or this trending topic,
#BlackWomenAtWork. This is definitely
of niche interest. So you have a huge fraction
of Black users posting it. Okay, I’m not showing
you the gender thing, but more women were
posting it among Blacks. So again, some trends differ heavily from
the Twitter overall population. Yeah. And the last type is this. So this was during the Dallas
shooting in July 2016. I’m showing you two hashtags,
#BlackLivesMatter and #PoliceLivesMatters. If you see, the Black Lives
Matter is dominated heavily by blacks, so 57% are blacks. Whereas, in case of
Police Lives Matter, 74% are white and 13% are black. So this was as if two
opinions fighting it out against each other. So one utility of this study is
suppose you were just seeing the trending topics. That doesn’t give you
the complete information. So we believe that it’s
important also to know that which community has made this
trending to get the perspective. So if you get these pictures,
then you know, okay, maybe you are able to understand in this
case because it’s very clear. Black lives matter and
police lives matter. But in some cases from
the hashtag itself, it might not be very clear. For example, the Democrats and
Republican question. So which are being promoted
by the Democrats and which are promoted
by the Republicans? Or say here are Congress and
maybe? These are not always clear. So this kind of analysis should
also accompany those trending topics to know that which
community is promoting trends. Okay, so thanks a lot,
the whole patient hearing. If any question of this session,
I’ll be happy to take.>>So the last study, so I can
see how interesting it is from a social point of view like
that, the point that you said, in that perspective. But as an application, suppose you were trying to build
an application on top of it. So do you have any ideas?>>So one thing is that
to inform people that not all hashtags are different. The other thing is like community
specific recommendations.>>[INAUDIBLE] Showing
them some [INAUDIBLE]>>Yes, yes, yes. That statistic developed
as part of the study. But one immediate application
would be community specific recommendations, so community
specific trending topics. If someone wants to know, so for example maybe a black
woman is seeing Twitter. She might not be very
interested in the whole global sort of trending hastags
where she doesn’t identify with most of them. Rather if she wants, we can
give her option of trending topics in her community. So people that her
community is talking about.>>So doesn’t itself select? This is like learning of
the problem that we all live in our bubbles and that-
>>Exactly, so yes, yeah, so that is something which we have
been studying the last couple of years. We do not take
judgmental stance, so we don’t say this is good or
this is bad. The think that we mainly say is,
there should be transparency. So at least a user
should understand that what I am saying, what is
the algorithm producing it? After that, suppose someone
understands everything about filter bubbles and all. But still, Yoshi is free say, I
want to be in my filter bubble. I want to see only one
side of the story. That’s fine, I mean,
it’s personal choice. After all, recommendations
are personalized, so it’s personal choice. But at least, that person should be aware that
there is another perspective. There are things beyond what
I’m hearing from my community itself. So we stop at making things
transparent, but we, yeah, but. I mean, yeah. And so again, so if we build
a community specific [INAUDIBLE] recommendation system,
maybe someone might use it to get more into the filter bubble
by only checking things which are popular in his or
her community. Whereas, some other user can use
the same service to check out things which are trending in
different communities and understand different
perspectives.>>[INAUDIBLE]
will be aware that they’re [INAUDIBLE]?>>Yes.>>Did not go.
Did you not think that [INAUDIBLE], and
then [INAUDIBLE] different, too.>>So the problem today is that
often our world view is shaped by what Facebook
algorithms are showing us. We don’t even know that
something beyond exists.>>We want to reach that
level where at least users know that there
are different perspectives. Then they’re free to choose
whether they want to go further into the bubble or
take a balanced view.>>And the variables that you
are showing, the gender, race, those are probably easy to
identify from the profile pic and other information
on profile. But often these bubbles
are around other kinds of ideologies. Do you plan to-
>>Exactly, so we are doing some study on
the political aspect at least, like the Liberals and
the Democrats. So there is, as you said, it is
difficult to even understand Psychology of a person. So, when you’re talking
about political bias, the first thing we come across
is, it’s very subjective. So suppose we
are doing MT studies. Now, the MT workers will
also have some bias. So suppose we showed them
a tweet and ask them, do you think it’s biased towards
Democrats or Republicans? Now that person’s bias will
interfere with his judgement. Right, so suppose a very
true example, still. Suppose someone sends a tweets
saying, Democrats are the best, whereas Republicans
are absolutely garbage. We show this to our MT worker
and ask, is this biased? If that person himself is
a democratically leaning, no, this is a fact, why should it be
biased, yeah, we are the best. Whereas you show it to
a Republican worker, then the opinion
would be different. And we are not allowed to ask,
I think. Okay, we are allowed to ask, but we cannot force them
to tell us their bias. While doing this
kind of service, we cannot force these subjects
to tell us their bias. Interestingly, Christian
had done another study. So they finally said that, let’s
not ask even this question. So if you show a tweet or
something or a text and say, is it biased, the answer would
be very much colored by the bias of the person taking the survey. They asked a different question. The question they asked is,
see this text? Is it clear to you
that it was posted by a person of which leaning? The difference is subtle,
but there is a difference. One thing is asking,
showing a text, do you think this is biased
towards side a or side b? First says, showing a text and
asking, see this text? Is it clear to you that it
was posted by a person of which leaning? If it is, then that is biased. If someone, by just seeing
a text, can say, this must have been posted by a Democrat party,
the Democratically leaning person, like the example
which l said. So there is a text saying,
Democrats are the best, whereas the Republicans
are absolute garbage. If l show you this text and ask, is it clear to you which side
this post has come from, irrespective of your bias,
you will say this must have come from a Democratically
leaning person. That is the argument that
they said, they erased. And they changed this question, nothing can we foolproof but
yes, they have some point.>>So, they didn’t even an ask
the question of what is correct or what is wrong,
what is in between at all? The questions they
asked are different. This particular part was
contributing a methodology to figure out whether
something is biased. This time, that methodology is
to ask people, is this biased? But they were saying,
let’s not use that methodology. Rather, let’s change
the question in this way. But then they studied
other things. Like, say,
before a controversial event and after a controversial event,
how things changed. Also another interesting
study they showed is usually which news headlines
become most popular, the biased ones or
the neutral ones? And what would be your guess?>>Biased.
>>Biased was a lot more popular. So those were the kind of
questions that they studied.>>And they air unbiased news,
do they exist? I mean-
>>So [INAUDIBLE] all the US newspapers, they’re known to have some
kind of political bias. Okay, there are neutral source, like New York Times
is quite neutral.>>No, not really.>>At least the official
scores start to have, it’s close to 0.5, but
there are news agencies very towards hundred [INAUDIBLE]
>>[INAUDIBLE].>>The degree of
bias might be there.>>Yeah.>>So there’s a computer model.>>No, it’s not [INAUDIBLE]
it’s also like I don’t know if the paper you
decide to publish [INAUDIBLE]>>Yeah, they do-
>>[CROSSTALK]>>Something is [INAUDIBLE]. When we say bias, we always
think of something really negative, that,
my God, it is biased. But everybody has a bias. [INAUDIBLE] you have a bias. So [INAUDIBLE] and Michael are-
>>You can [INAUDIBLE] biased, right? If you have an opinion,
you’re biased. If you don’t have an opinion,
then->>[LAUGH]>>You need bias for machine learning. Machine learning doesn’t work
if you don’t have a bias. That’s how it works it. [LAUGH]
>>And people are really negatively biased
towards this term called bias. Every time you mention
that in a paper, chances are rejection
goes higher.>>I think they should come up with a new-
>>Yeah, so reviewers are actually work
with us in the reviews. This is not our bias,
don’t use it on bias. Use something else. [INAUDIBLE]
>>People are really sensitive around that topic. So leaning or not,
people are willing to accept. Bias has this very negative
connotation to it.>>Did you try to increase
these experiments with alien>>Examples queues or trends.>>I mean, okay,
that’s a good question. We did think of that, but
we have not started it. Because if you look at
the the differences between Democrats and Republicans
are well-documented for us. And second,
they’re not limited to politics. If anything said that any
kind of food that referees is different, the kind of sports
the referee is different, it’s been at least said and you
can sometimes make it out but India we started this study but
they backed off. The differences are not so much often they are cost based
startings which are very difficult to figure out from
>>Textra or things like that. I mean, the text doesn’t
give much difference. Whereas, if someone is a far,
maybe higher cost, lower cost, those kind of things
come into picture. So like politics is in some way, less engrained to
other social aspects It might be changing
>>It might be changing, sure. But, yeah,
we have into those difficulties.>>I don’t have a documented
thing to correlate it with.>>Yeah,
that would be actually good. Yeah, so either politics and
social factors, maybe they are too complex but
it has not been Figure out where there is a lot of
study on Democratic and Republican leanings,
[INAUDIBLE] etc. I saw newspapers in the US and all the whole [INAUDIBLE]
not in Indian context.>>[INAUDIBLE]>>[INAUDIBLE]>>When your sales and the existing [INAUDIBLE]
they like to say, we always say [INAUDIBLE]
is biased, but [INAUDIBLE].>>And [INAUDIBLE] that.
>>Yah, so the [INAUDIBLE] we might
say all these things, etc., but there is no study that.>>There is no study. [INAUDIBLE]
>>And to my knowledge, that is a different is another
thing that are not fully aware of, and not seen anything.>>How
>>From I don’t think->>I don’t think->>That might be the difference.>>Might, sure.>>There’s a lot of
people that pull out.>>I mean, you have thrown
regulation changes with everything there is.>>[LAUGH]
>>But those are very
>>Stick there. I guess it still
says most people. [LAUGH] Either [INAUDIBLE],
because, instead of their resultant
uplifting [INAUDIBLE]. Mexico is one,
applications switch bodies if they’re not super
really awesome. [INAUDIBLE]
>>Yeah, [INAUDIBLE]>>[INAUDIBLE]>>I mean, ideologies are only now beginning to
be set in stone, right? Earlier, everybody
had a similar kind of don’t like I’m going to win, and
I move the big button all right? So only now we’re talking
about Across the board. I don’t think a lot of are people are taking
across the board. Yeah, see much more
ideological framing. [INAUDIBLE]
>>Very interesting analysis of [INAUDIBLE] economics practical
that help [INAUDIBLE]>>[INAUDIBLE] to politics.>>Going forward, it might be [LAUGH]
>>Yeah, I mean, whether you want to see
the inquiry or not. [LAUGH] Ooh class
it’s true l mean for last couple of years we are,
we are like a we had a writing spending topics
which are politically related.>>Yeah, l can’t imagine.>>Yeah l mean, there are a lot
of topics to study while in>>Why you choose, those kinds of things, yeah.>>A case in point is the person
who did the evaluating of the EVM machine, right?>>Yeah.>>And he’s hounded, like anything and he’s actually
assigned to just study.>>Yes. Great and it really [INAUDIBLE].>>Even our department did the
study and they [INAUDIBLE] but they chose not to report it
because they were scared that since we live in it’s
not a safe environment. The head of the department
didn’t allow it to go out and this is very good. Okay, thank you.>>[APPLAUSE]

3 thoughts on “Social Network Analysis

  1. mmmmm – the amount of hedging and hanging statements suggesting uncertainty ('or something') kind of undermines trust in what the speaker is saying.

    Lazarsfeld and Merton (1954) came up with homophily. It's in the Wikipedia article on the concept

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top