Background on Classifiers

Sometimes I think we should give up on the classification of bad actors in SurveyMan – it’s a very hard problem and developing solutions feels very far afield from what I want to be doing. Then I see blog posts like this one that show the limitations of post-hoc modeling and the pervasiveness of bad actors. As some of the commenters pointed out, if people were responding randomly, then we could treat their responses as noise. However, what they are pointing out is that people in fact do not respond randomly – there may be some questions that are more likely to elicit “incorrect” responses than others, which puts us in bias detection territory.

[Read More]


My Personality

To get a sense of a background paper on identifying careless responses, I took their personality inventory. For funsies, here’s what it came back and said. My non-average traits are:

Domain Subdomain High Low
Extroversion Assertiveness  
Agreeableness Morality  
Agreeableness Altruism  
Agreeableness Coorporation  
Conscientiousness Self-efficacy  
Conscientiousness Orderliness  
Conscientiousness Dutifulness  
Conscientiousness Cautiousness  
Neuroticism Anger  
Neuroticism Vulnerability  
Openness to Experience   Artistic Interests  
Openness to Experience Intellect  
Openness to Experience Liberalism  


[Read More]


Crowdsourcing system basics

Some time ago I started writing up the requirements for a sound crowd sourcing system. Last year I wrote about Amazon’s report on using verification tools at scale and it got me thinking about the kinds of abstractions one would need to formally verify a crowdsourcing system.

[Read More]


Smarter scheduling in SurveyMan

Conventional wisdom (and testimonials from researchers who have been burned) says that time of day can introduce bias into crowdsourced data collection. Right now, SurveyMan posts a single HIT per survey, requesting \(n\) assignments. If we collect \(n\) assignments and find that they are low quality, we ask for more by extending the HIT.

[Read More]