Run the mdesc command to see which variables contain missing values. īut that obviously only works with discrete values of x. if you know that the only positive values are 1 and 2, this would work: gen above_zero =. If the number of categories is small, you could replace values with specific conditions, e.g. (This is safer than generating wrong values for some observations and then fixing those instances by replacing them with missing values.) Tip 2 When you generate variables based on a condition contained in an existing variable, execute your command only for the non-missing cases, as above: gen above = (x > 0) if !missing(x) which is equivalent to: Based on the pitfall shown in the simple example, here are a few suggestions.
#Drop if stata code#
This should bring home that if you are not absolutely sure you know what each line of code is doing, it will be a good idea to check the dataset and inspect new variables that you generated.
To classify each observation properly, run the following instead: drop below Once this fact is absorbed, everything is consistent, drop and keep statements work as one would expect, and the logical comparisons make sense. In the current system, you must be aware that missing values are coded and treated as positive infinity. Have they been classified as intended? No.
The first observation is missing and the second observation is positive. (Many people will generate a variable equal to zero and then run something like replace above=1 if x >0 that will not help.) Then run the following code in Stata, or in your head, and see for yourself if you know what happens: * Classify x based on its sign Replace x = 1 if _n=2 /*Set observation 2 to one*/ As a simplest case, generate a sample of just two observations: clear To test your understanding, generate a variable that is sometimes positive and contains missing values in all other cases. As always, recode your variables slowly and carefully. The problem: a Stata user may incorrectly classify/tag people who do not meet the specified condition. A common source of mistakes is generating a binary variable that should classify observations according to a particular condition (for example, tag everyone with income higher than 100K as a "high income individual").