Ad verba per numeros

Most Recent Entries

Archives

2023
- March
2022
- March
2021
- May
- January
2019
- July
- March
2016
- January
2014
- October
- June
- May
2013
2012
- November
- June
- May
2011
- October
- September
- July
- June
- May
- February
- January
2010
- December
- June
- April
- March
- February
2009
- December
- November
- September
- August
- July
- June
- April
- March
- January
2008
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
2007
- December
- November
- October
- September
- July
- June
- May
- April
- March
- February
- January
2006
- December

On the lack of citation to Morstatter et al. 2013 in DiGrazia et al. 2013

Thursday, August 15, 2013, 10:06 PM

Warning: This post is a follow up to this one. Unless you read that one before, you'll probably won't get the point of this one.

In my previous post I said that one of the issues with the DiGrazia et al. (2013) paper is that they did not mention the work by Morstatter et al. (2013).

The later work is crucial because it shows that when using the public Streaming API ~~gardenhose~~ (i.e. the randomly sampled 1% stream of tweets that most researchers use) results are quite different from those obtained when using the firehose (the whole stream of tweets).

In my previous post I said that DiGrazia et al. should have at least acknowledged that they didn't known how representative their data was on the basis of the work by Morstatter et al. [Addendum, August 18: However, as Alex Hanna accurately pointed out, this would not exactly apply to the study by DiGrazia et al. since they used the gardenhose (10% sample) and not the 1% public Streaming API.]

To justify that I simply said that the work by Morstatter et al. preceded that by DiGrazia et al. However, Emilio Ferrara told me that I was wrong on that since the work by Morstatter et al. was published in July and the first draft by DiGrazia et al. was published in February.

I was sure, however, that DiGrazia et al. were aware of that work because I had addressed them to a preprint on April 25. That was the reason for me still pointing that flaw in the paper.

However, I was not aware of the deadlines for the annual meeting of the ASA: on January 9, 2013 papers should have been submitted, and March 18, 2013 was the date decisions letters should be sent to authors. Besides, on April 30, 2013 the final program for the conference was to be announced so I assume that between late March and early April authors should have submitted their camera ready version of the paper.

So, in short. The work by Morstatter et al. was available online at least on April 25 and DiGrazia et al. should know of it at least from that date because of my e-mail. However, it's very likely that by that day they had already submitted their camera ready version of the paper and, hence, that would explain the lack of that reference in their paper. [Addendum August 18] Besides, as aforementioned, it's debatable whether the findings by Morstatter et al. when comparing the public Streaming API with the firehose could apply or not to the gardenhose employed by DiGrazia et al.

Because of this I have striken through that concrete piece of criticism in my previous post.

Nevertheless, the findings by Morstatter et al. are a source of concern for all of us working with Twitter public Streaming API (the 1% sample) and even maybe those working with the gardenhose (the 10% sample) since we simply don't know the biases they can exhibit when compared with the whole firehose.

Back Next