To request a blog written on a specific topic, please email James@StatisticsSolutions.com with your suggestion. Thank you!

Tuesday, April 2, 2013

Confidentiality vs. Anonymity



In the data collection process, when researchers are trying to obtain information from survey participants, they frequently indicate that the survey will be conducted anonymously or confidentially.  There are distinct differences between the two terms and the researcher should be clear on the meaning of each, as they are very important for the protection of the participant.  It is also important to remember that a study, with only one data collection method, cannot be both confidential and anonymous.  Research participants should be informed beforehand of the type of data collection that will take place.  They should also be informed of how long the data will be stored, where the data will be stored, and how it will be destroyed after an appropriate amount of time. 

When data is collected and held anonymously, it indicates that there are no identifying values that can link the information to the participant; not even the researcher could identify a specific participant.  Online survey tools are typically conducted anonymously, however, the researcher needs to be certain the IP address is not stored.  The researcher also needs to keep in mind that too many identifying demographic variables could hinder the anonymity of the participants.  In an anonymous study, the researcher needs to indicate how the participants will be kept anonymous.  Be certain to include that no participants will be identified.  It is also helpful to let the participant know that data will be analyzed at the group level.

When data is collected and held confidentially, the researcher can identify the subjects.  One way of identifying the subjects is to assign and identifying number or code to each participant.  Additionally, any survey that takes place in face to face environment is automatically labeled as confidential, as the researcher will know who provided the data.  When data is collected confidentially, the information needs to be kept in a secure environment because the participants are identifiable.  And similarly to anonymous data collection, it is also helpful to let the participant know that data will be analyzed at the group level in order to de-identify participants.  Identifying numbers will not be presented in the results of the analyses. 

Friday, March 8, 2013

Odds Ratio


  • Odds Ratio (OR) measures the association between an outcome and a treatment/exposure.  Or in other words, a comparison of an outcome given two different groups (exposure vs. absence of exposure).

  • OR is a comparison of two odds: the odds of an outcome occurring given a treatment compared to the odds of the outcome occurring without the treatment.
  • Odds represent the probability of an event occurring divided by the probability of an event not occurring.

  • Although related, probability and odds are not the same. Probability values can only range from 0 to 1 (0% to 100%), whereas odds can take on any value. 

  • An OR value of 1 indicates no effect on the odds from the exposure to the outcome; of OR values less than 1 indicate that lower odds of the outcome are attributed by the exposure; and of OR values greater than 1 indicated that higher odds of the outcome are attributed by the exposure.

  • OR can be used to assess if a particular treatment is considered a ‘risk factor’ for a particular outcome.

  • To calculate OR, the frequencies of two dichotomous variables are required.

  • For example: a study consist 263 participants, where the aim of the study is to assess the OR of having the flu virus given the presence of a diet pill.  

o   The two dichotomous variables to be examined are flu virus (yes vs. no) and diet pill consumption (takes regularly vs. does not take regularly). 
o   Of the 100 participants: 45 participants have the flu virus and regularly consume diet pills, 86 participants have the flu virus and do not take diet pills regularly, 32 participants do not have the flu virus and take diet pills regularly, and 100 participants do not have the flu virus and do not take diet pills regularly.
o   To calculate the OR, we make sure the numerator and denominator are in the correct location:

      • the numerator of the ratio is the number of participants who have the flu and take diet pills regularly (45) divided by the number of participants who do not have the flu virus and take diet pills regularly (32); and
      • the denominator of the ratio is the number of participants who have the flu and do not take diet pills regularly (86) divided by the number of participants who do not have the flu and do not take diet pills regularly (100).
      • OR = (45/32) / (86/100) = 1.63

o   Thus, the odds of having the flu are 1.63 higher given the regular consumption of diet pills compared to not taking diet pills regularly.