import CSV
using DataFramesPilot 1, cleaning the data
Clean the raw data file, and lays out the assumption behind the cleaning.
# Functions
include("PilotFunctions.jl")transform_labels
Constants
const n_pages = 7;Reading the Data
Time Spent
Read the table where the time spent on each page is kept. Then apply the function treatment to each subject.
timespent = CSV.read(joinpath("..", "Data", "OriginalData", "TimeSpent (accessed 2020-04-30).csv"), DataFrame, normalizenames = true, stringtype=String)
timedata = combine(groupby(timespent, :participant_code), x->treatment(x, n_pages - 1));
# timedata = by(timespent, :participant_code, x -> treatment(x, n_pages - 1));Raw Data
Read the raw data (after the anonymization)
rawdata = CSV.read(joinpath("..", "Data", "OriginalData", "Pilot1RawData.csv"), DataFrame, normalizenames=true, stringtype=String);Aggregate Time and Raw Data
data = innerjoin(rawdata, timedata, on = :participant_code);Cleaning the Data
Finishing the Experiments
Removed subjects who have not finished the experiment
Check that the raw data and time date coincide on who finished the experiment.
Then keep only subjects who have finished the experiment, and remove the finished column as a consequence.
@assert (data[!, :participant_index_in_pages] .== n_pages) == data[!, :finished] "The method used to determine who has finished is not correct"
data = data[data[!, :finished], :]
select!(data, Not([:finished])); Selecting a subset of columns
We keep on the columns from the following relevantcolumns variable.
const relevantcolumns = [
:participant_code, :time_in_experiment, :participant_payoff, :participant_payoff_plus_participation_fee, :session_code,
:survey_1_player_age, :survey_1_player_gender_rev, :survey_1_player_gender, :survey_1_player_employment,
:survey_1_player_region, :survey_1_player_urn_rev,
:survey_1_player_urn, :survey_1_player_colour_rev, :survey_1_player_colour, :survey_1_player_colour_drawn,
:survey_1_player_urn_winner, :survey_1_player_round1, :survey_1_player_round2, :survey_1_player_round3,
:survey_1_player_round4, :survey_1_player_round5, :survey_1_player_total_time, :survey_1_player_probability,
:survey_1_player_mechanism_winner, :survey_1_player_belief_winner, :survey_1_player_mechanism_rev,
:survey_1_player_mechanism, :survey_1_player_ability, :survey_1_player_knowledge, :survey_1_player_belief,
:survey_1_player_reasons, :time_page_1, :time_page_2, :time_page_3, :time_page_4, :time_page_5, :time_page_6,
];irrelevantcolumns = [col for col = names(data) if !in(col, relevantcolumns)]
select!(data, relevantcolumns)
irrelevantcolumns57-element Vector{String}:
"participant_id_in_session"
"participant_code"
"participant_label"
"participant_is_bot"
"participant_index_in_pages"
"participant_max_page_index"
"participant_current_app_name"
"participant_current_page_name"
"participant_time_started"
"participant_visited"
"participant_payoff"
"participant_payoff_plus_participation_fee"
"session_code"
⋮
"survey_1_player_belief"
"survey_1_player_reasons"
"survey_1_player_payoff"
"survey_1_group_id_in_subsession"
"survey_1_subsession_round_number"
"time_in_experiment"
"time_page_1"
"time_page_2"
"time_page_3"
"time_page_4"
"time_page_5"
"time_page_6"
Striping the column names from survey_1_player_
for col = names(data)
m = match(r"survey_1_player_(?<column>\w+)", string(col))
if !(m === nothing)
rename!(data, col => Symbol(m[:column]))
end
endTransform Columns Types
Transform some string columns with only two possibilities into boolean columns.
booleancolumns = [:ability, :knowledge, :gender_rev, :urn_rev, :colour_rev, :urn_winner, :mechanism_winner, :belief_winner, :mechanism_rev]
for col = booleancolumns
data[!, col] = (data[!, col] .== 1)
endTransform the mechanism column into a boolean. Value of 1 if RPS, 0 if Coin Toss.
data[!, :mechanism] = (data[!, :mechanism] .== "Rock, Paper, Scissors");Transform the colours of the balls in the ambiguity choice into boolean values. The colour black is transformed in 1, on red into 0.
data[!, :colour] = (data[!, :colour] .== "Black");Create a boolean telling whether RPS was better than Coin Toss or not.
data[!, :female] = (data[!, :gender] .== "Female")
data[!, :male] = (data[!, :gender] .== "Male");
data[!, :other] = (data[!, :gender] .== "Other")
select!(data, Not(:gender));data[!, :RPSbetter] = (data[!, :probability] .> 0.5);Consider that everyone that chose the left urn is ambiguity averse. This is a debatable assumption, but the best we can do with the available data. The difference in the number of ambiguity averse / ambiguity loving subjects is correct if ambiguity neutral subjects randonly choose between the right and left urns.
Remove then the :urn column that encodes exactly the same information.
data[!, :ambiguity_averse] = (data[!, :urn] .== "Urn Left");
select!(data, Not(:urn));Create a variable characterizing the four different combination of treatments that are possible.
data[!, :treatment] .= "A, P"
data[.!data[!, :knowledge] .& data[!, :ability], :treatment] .= "A, NP"
data[.!data[!, :knowledge] .& .!data[!, :ability], :treatment] .= "C, NP"
data[data[!, :knowledge] .& .!data[!, :ability], :treatment] .= "C, P";# Compute the observed probability of winning the mechanism of each subject.
data[!, :probability_winning] .= 0.5
data[data[!, :mechanism], :probability_winning] .= data[data[!, :mechanism], :probability];# Transform the belief of number of win into a probability
data[!, :belief] = data[!, :belief] ./ n_profiles;sum(x -> ismissing(x) ? false : x, data[!, :belief] .<= 5 / n_profiles) / sum(.!data[!, :knowledge])0.14583333333333334
A substantial number of subjects (15%) do not seem to have understood our belief question, as they report belief between 0 and 5, which seems clearly wrong.
Creating a measure of overconfidence: when people don’t know the probability (which is the only case where we have the belief in Experiment 1), it is the difference between their probability of winning the RPS and their belief. If positive, subjects are overconfident, if negative, they are underconfident.
data[!, :overconfidence] .= 0.
data[.!data[!, :knowledge], :overconfidence] = data[.!data[!, :knowledge], :belief] .- data[.!data[!, :knowledge], :probability];
data[!, :overconfident] .= (data[!, :overconfidence] .> 0);data[!, :rps_believed_really_better] = Bool.(data[!, :RPSbetter]);
data[.!data[!, :knowledge], :rps_believed_really_better] .= (data[.!data[!, :knowledge], :belief] .>= 0.5);# Unify the definition of a country and remove the previous column `:region` that encoded the same data.
data[!, :country] .= "USA";
for (i, region) = enumerate(data[!, :region])
m = match(r"(?<country>US|States|california|georgia|arizona|america)"i, region)
if m === nothing
m2 = match(r"(?<country>india)"i, region)
if m2 === nothing
data[i, :country] = titlecase(data[i, :region], strict = true)
else
data[i, :country] = "India"
end
end
end
select!(data, Not(:region));Transform the different RPS rounds choice into numbers. The translation is Rock = 1, Paper = 2, Scissors = 3
for i = 1:5
data[!, Symbol("round$(i)")] = translaterpstoint.(data[!, Symbol("round$i")]);
end## Hourly Wage
data[!, :wage] = data[!, :participant_payoff_plus_participation_fee] ./ data[!, :time_in_experiment] .* 3600;
Comments Encoding
We have encoded the comments according to three dummy variables
probability_driving_choice,preference_driving_choice,doubt_interpretationand a string of commentadditional_notes.probability_driving_choicesmeans that we read in the comment that higher probabilities of winning are what drives the choices of a mechanism over another (even if the belief/understanding and subsequent choice may not reflect that).preference_driving_choicemeans that we read in the comment a intrinsic preference for one or the other mechanism.doubt_interpretationmeans that we are not sure of the interpretation of the comment, so that our interpretation should be taken with a grain of salt. In the data, it is originally encoded as?, we replace that by a dummy of value 1 when we have a doubt.additional_notesare additional remarks we have on the comments. In particular, if a preference is expressed for one or the other of RPS / CT, we mention it.Join together the data and our comments.
Take the reasons to transform anything that looks like an absence of comment into an empty comment. Remove the previous column
:reasonsthat contained the comments. This filter is very imperfect.