Repurposing Social Media for Social Research? Five Critiques
Articles Blog

Repurposing Social Media for Social Research? Five Critiques

What I’m going to do this morning
briefly, is introduce some critiques of Social Media Research. So I’m not gonna critique
social media, I’m not gonna do platform critique but I’m gonna do research
critique. Now, embedded in the research critique will be some some critiques of
Facebook policy or Twitter policy or things like this, but that’s
not the the point here, the point is rather: What are the implications for
doing social as well as cultural research when using social media
platforms? And one of the things that I have been a champion of starting with the
digital methods book that I published in 2013 with MIT is the notion of
repurposing. So digital methods as an idea was built on the notion of using
existing online data that was collected for other purposes and then repurposing
it for social research. Now recently this particular idea has come under fire and
so what I want to do is I want to talk about why that is and not necessarily
provide solutions. I’m just gonna talk about why it is that there’s this issue of repurposing online data or traces, the study of
digital traces, why that has become problematic and then perhaps in the
discussion we can start talking about what to do about these problems but I
would just want to put out the problems and open up a discussion. So that’s the
purpose of what I’m gonna do today. So I’m going to touch on 5 points. Five
discussion points. I’d like to talk about why it is that social media has
oftentimes been criticized for not being “good data” the second one is in
reaction to quite a well-known article that Crawford published
recently called ‘Where are the human subjects in big data research?’. So this is
part of the ethics turn in social media research and in
online research more generally, an ethics turn which goes back I think at least a
decade or so but really has come to the fore in the last couple of years with,
especially now in the last couple of days, with this idea of Facebook being
held accountable for having a researcher harvest something like 50 million
profiles which at the time in 2013 and 2014 as you probably all well know, was
perfectly doable. So I don’t know if you guys have done it, I’m not saying that I
did, but but this is something that is now, has now
extraordinary attention on it. The third one and this has been lurking in the background
this issue for quite some years is the idea of social media platforms as
proprietary platforms which have very very different goals than than
research goals and of course that’s not completely sort of true in the sense of
that’s not that’s a bit black and white, because of course there are many
many data scientists publishing, working at these companies so – and a lot of the
data that’s being collected are actually collected for dual purposes – but
nevertheless. So the fourth one is this repurposing issue which I’ve already
introduced. And then the fifth one is – some people talk about it as as a viable
way forward – should we be pursuing, as researchers, alternatives to the current
online social media platforms for data collection for our own research
and what are some of them and what can we do with them? So,
ok, let me just start with this idea of good data. The first point that’s often made in
the critique of social media research is that social media
platforms are not research instruments that are set up for the purposes of
doing research. So they aren’t sensors for collecting a carbon dioxide
levels in the in the air, they’re not sensors for, no. They’re set up
for something else and and if we were to think of them as being interesting in
terms of social sensors for social listening as it’s oftentimes called, what
are the problems with doing that? Well the problems are that they are unstable,
so they’re not they’re not good data in the sense that the data remains clean or
the data remains the same over time, the fields. And they’re not additive either
some fields disappear and other fields appear. And then when they do there’s
then what you could call interactive complexity in the
data collection, so certain data fields that were collected previously likes are
then affected by new data fields that are introduced, Reactions. So then if you
look at Likes over time you have some issues that need to be resolved. But it’s
not only that the data fields change also the metrics, the inbuilt metrics
that are used which also could be considered to be data also change. I
don’t know if you are familiar with the work of Jonathan Albright at the Tao
Center at Columbia who is the one who looked up the six Russian disinformation
pages that were known at the time and last October:
“Blacktivists”, “Heart of Texas”, “United Muslims of America” these sorts of things.
When he researched the amount of ‘interactions’ on Facebook that these
Russian disinformation pages got what he found was when he was doing the
work the actual metric that was used at Facebook on Crowd Tangle itself changed
three days later. So it’s the metrics also the change. The second one is,
you’ll hear also in the news right now about, that, the researcher at Cambridge University who fed, one way or
another, the Facebook data to Cambridge Analytica, broke Facebook rules. So if
you go to the various platforms, they have various rules. Twitter Rules that’s
where the term really comes from. Well these rules are actually don’t work so
well for research, for our research. So when Twitter accounts are deleted or
suspended for example, the data that we hold that has been deleted on the
platformm we’re supposed to delete it as well. And and then if we don’t, we break
the rules. Or for example if I collect suspended tweets
that were suspended in Germany because there are extremist tweets and I have
them in the Netherlands it’s a bit of a gray area, the extent to
which I could use them for my research. I … it’s a gray area. So we could all say: oh
you know, we have no problem breaking these contracts that we enter into as we
browse or as we download data, in the past we did that all the time. But
it’s increasingly an issue. The one that oftentimes people
don’t talk about as much is that privacy which is normally this thing that’s held
out as something that everyone should respect what privacy settings actually
kind of sully our data if you will. So if everyone started to turn up their
privacy settings we would get we would get less data so. And data becomes uneven because of uneven privacy settings. Okay.
The second one: human subjects. It’s interesting that for a number of years
some researchers would argue whether it’s in print or elsewhere that the fact
that users have signed on to terms of service on Twitter on Facebook and
elsewhere provides a cover for researchers because it says, in the
contract that users enter into with the social media platforms, because it says
quite clearly and over and over again that this data can be used for not only
the improvement of the software but also for marketing research and for other
research purposes that this then, the researchers are thus in the clear.
Now that particular idea of we are complying as researchers has also come
under quite some scrutiny recently not only with the the developments of
particular kinds of ethical guidelines by for example the Association of
Internet Researchers but also through increasing work in the
area of ethics as applied to our data ethics let’s just call it. And the
notions that are being put forward are feminist ethics of care and
contextual integrity or contextual privacy but that the notion that was put
forward by Helen Nissenbaum. The ethics of care goes back to the 80s
Carolyn Gilligan and others. So there’s a very, very kind of different sort of
ethics being put forward to now treat data and when we move to the
idea of an ethics of care or the idea of
contextual privacy, which I’ll say something about in a second, these ideas
are oftentimes considered incompatible with big data research. So how do you do
big data research if you, for example, want to respect the contextual privacy
of others. Contextual privacy in this sense means that the users did not
expect their data to be used other than – to be used out of the context that they
were giving it. That is when you tweet you don’t think of this tweet being then
being analyzed by an academic researcher. The third one that people oftentimes
don’t talk about which is kind of interesting to me at least, is should
social media users be treated not as subjects. Well anyway they should be
treated subjects. But also as authors. So, so if you use their data are
you citing them or quoting? And it’s interesting that when in a
Twitter and society volume that was published in 2015 I think it was one
particular legal scholar took up this this question and said: well a tweet should not be considered as being authored because it was not the product
of the sweat of one’s brow so it wasn’t authored in in a kind of traditional
sense of what one considers to be authors work. The question of the impact of
proprietary data or proprietary operations on research, I think it’s kind
of interesting when you talk about social media companies these days,
obviously over the last 10 years or so social media data by the companies have
been increasingly commodified. So they are increasingly packaged and sold as
products. That much is clear. But that doesn’t necessarily interfere with
one repurposing it if one can still get a hold of it. But the amount of
sort of free data or what – or the quality in particular of free data has
waned in comparison with the rise of the proprietary data and the
price tag on it. So that’s one impact, so the quality goes down. The
second one is that one could say: okay but we can go back in time and get the
data. Well, up until January of this year there was this hope that the Library of
Congress would hold all of the all of Twitter’s archive and they announced
that they’re stopping doing that and they’re moving over to a sort of
selection policy sort of curated data sets. So they’re no longer holding a
complete, will hold a complete archive. And anyway they couldn’t handle it. So no
researcher ever got access to the Library of Congress Twitter archive as
you probably know, even though it was thought that it would
be doable somewhere around 2015. But in any case the the archives are now
currently held by the companies and of course those archives are then
updated by the companies, right? So when when you go to Facebook and you say:
Where’s the deleted Russian disinformation pages? You can’t go to the
library Congress and say well are there any archives there – they’re gone.
And so this becomes a quite a serious, I mean they’re gone publicly.
So this is this becomes quite a serious issue for academic research. The
other one that I want to mention is, so science by social media APIs
are treated like the same way as a marketing company and then if we get too
much data if we like, try, like exceed the rate limits, exceed the speed by
which you’re able to collect data we’re treated as a
kind of spam, like a spammy case, you know? ‘Oh this is uh, this is like an
overzealous marketing organization or some sort of …’ you know? You’re breaking
you’re breaking the terms. And then what’s in the backs of the minds of the company of course is is that people are trying to get the data to
resell it so science becomes the equivalent of a marketing, of a
dubious marketing company, right? So we’re spammy. And what Twitter recently so I
did quite some negotiations as a member of the Association of Internet
Researchers with Twitter because we use a lot of Twitter data, other
academics have as well, and what Twitter has come up with most recently and they
announced it and it broke I think probably a lot of people’s tools
to collecting Twitter data, but anyway they came out they rolled out this new
model and now we’re treated as consumers of a freemium. So we can get, we can do 50 historical queries but above that then the price goes
up. Yeah I’ll be quick, I’m going overtime. Okay: repurposing. So the arguments
around the difficulties with relying on the very idea of of repurposing
traces or online data collected for other purposes to do social and cultural
research that has been critiqued recently quite forcefully, and one of the
the points that’s been made is that well, platforms are in a completely
different business. They’re not in the business of science, they’re in the
business of data extraction, they’re in there they’re an extraction – they’re one
of the new extraction industries. And they don’t crowdsource they crowd-fleece
this is a term that Trevor Schultz and others use. They’re in the
business of crowdfleecing. So this idea that you would rely
on an instrument that whose politics are quite problematic itself would
be problematic. It’s as if you’re complicit, or are or normatively
dubious. So this is the critique and that’s very very different spirit, it’s a
very very different set of norms, so I meant – I’m talking about kind of
Mertonian norms, the norms of science it’s a very very
different set of norms. And then what science have and
they’re very … and so the second one is that when we’re studying the social
through platform data we’re getting the social via a kind of
advertising logics. So the platform is built to extract data in order to
advertise to others rather than for other reasons like to enhance the public
sphere or what-have-you and so the prism through which you’re studying the
social is this one. Finally I don’t know if this goes, this last
point goes in this particular points but I think so. So when we’re trying to make
we’re really working hard to make social media API’s productive for research for
science, we compromise ourselves. So I mean, I have, I don’t know how many terms
of service I’ve broken. And you know collecting a search engine results, and
so you and then you’re actively worked against by the companies.
So they’re actively blocking you, so the more they block you, of course, the
more you’re like: mmm we think of another workaround. So you’re being
pushed in a particular way so that you’re in fact almost being
lured into compromising yourself so where do you draw the line as they block
and as they limit rate limit as etc. So this is a tricky issue. Okay so I want to just conclude really
quickly with the question of alternatives. I don’t know if your
up on notions of, for example, platform co-ops and others. So there have been a
series of proposals put forward for changing the online landscape
for a number of reasons the first one is that it has been observed by
Tim berners-lee and others that the web is in decline and one of the major
reasons is the rise of the social media platform, sucking the people in,
locking them in and use – and the results is a fallowing web. So if you go sector
by sector across the web the non-governmental web doesn’t look very
healthy. The governmental web, ok. Commercial web, perhaps ok. But other ones … so this is more of an empirical question this is an observation I’m making. But
the health of the web is an issue and and so a lot of people argue: well it
doesn’t have to be that way and we in science and education could be a driving
force of good, for a public and an open web versus a proprietary and a closed
one. Trevor Schultz in arguing for this platform cooperativism goes much farther
saying you cannot change, so he doesn’t get he doesn’t sort of buy into the
argument that users can do, can affect much much change. You have to own
the platforms in order to change it. The other one is some 0 so should we be
doing research on that? So if you compare the amount of articles about alternative
social media or even what I once called secondary social media smaller if you
compare the number of, the amount of work being done on that to the amount of work
being done on Facebook I mean it obviously pales in
comparison and then finally some alternatives in our own sector,
we’re also users of, academics are users of social media, academic social
media so: ResearchGate, and the rest, and there are alternatives such
as Scholarly Hub. Is anyone on Scholarly Hub? Anyone? Okay maybe there are others.
But in any case, this is something a question to be put nowadays
to social media research, the critique of social media research and the critique
of social media researchers, I guess is where this then concludes of whether or
not we as researchers should seek to strive to research and also promote
alternatives. Okay, thank you very much.

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top