Quantcast
Channel: Statalist
Viewing all 65017 articles
Browse latest View live

Mark Variable if Before or After Specific Date by Group

$
0
0
Good morning,

Would appreciate some help with this please.

Looking to code the 'wanted' column

The group variable is 'pet'
If 'petcode' = 2 I need to put a '3' in the wanted column for all ddates that occurred before 'petcode' = 1 for that pet
If 'petcode' = 2 I need to put a '4' in the wanted column for all ddates that occurred after 'petcode' = 1 for that pet
If 'petcode' = 1 Just put a '1' in the wanted column

Thanks

Code:
* Example generated by -dataex-. For more info, type help dataex
clear
input float ddate str16 pet byte petcode float wanted
22397 "Dog" 2 3
22611 "Dog" 1 1
22653 "Dog" 2 4
22686 "Dog" 2 4
22722 "Dog" 2 4
22765 "Dog" 2 4
22854 "Dog" 2 4
22907 "Dog" 2 4
22930 "Dog" 2 4
22983 "Dog" 2 4
23005 "Dog" 2 4
23046 "Dog" 2 4
23084 "Dog" 2 4
23109 "Dog" 2 4
23128 "Dog" 2 4
22876 "Cat" 1 1
22897 "Cat" 2 4
22906 "Cat" 2 4
22983 "Cat" 2 4
23005 "Cat" 2 4
23225 "Cat" 2 4
end
format %td ddate

svylogitgof command showing r(198) Error.

$
0
0
Hi! I am doing a multivariate logistic regression using Survey data in stata18. I have used the weighting as per the survey and i have been using svy: before my commands. After fitting my main model i then used svylogitgof which i have installed for goodness of fit test, but it is showing an error r198. below is the outcome of my stata 18 outcome.



. svyset [pw=intwt0], jkrweight(intwt001-intwt141, multiplier(1)) vce(jackknife)dof(25)

Sampling weights: intwt0
VCE: jackknife
MSE: off
Jackknife weights: intwt001 .. intwt141
Design df: 25
Single unit: missing
Strata 1: <one>
Sampling unit 1: <observations>
FPC 1: <zero>

. xi: svy: logistic medicalcircumcision i.males_age i.wealthindex i.hivstatus i.evertestedhiv i.Regio
> n
i.males_age _Imales_age_1-9 (naturally coded; _Imales_age_1 omitted)
i.wealthindex _Iwealthind_1-5 (naturally coded; _Iwealthind_1 omitted)
i.hivstatus _Ihivstatus_0-2 (naturally coded; _Ihivstatus_0 omitted)
i.evertestedhiv _Ieverteste_0-1 (naturally coded; _Ieverteste_0 omitted)
i.Region _IRegion_1-4 (naturally coded; _IRegion_1 omitted)
(running logistic on estimation sample)

Jackknife replications (141): .........10.........20.........30.........40...... ...50.........60.....
> ....70.........80.........90.........100.........1 10.........120.........130.........140. done

Survey: Logistic regression

Number of strata = 1 Number of obs = 4,460
Population size = 298,166.63
Replications = 141
Design df = 25
F(18, 8) = 6.80
Prob > F = 0.0047

-------------------------------------------------------------------------------------
| Jackknife
medicalcircumcision | Odds ratio std. err. t P>|t| [95% conf. interval]
--------------------+----------------------------------------------------------------
_Imales_age_2 | .5392798 .0570262 -5.84 0.000 .4337415 .6704979
_Imales_age_3 | .4172206 .0553734 -6.59 0.000 .3174352 .5483734
_Imales_age_4 | .3751198 .0550486 -6.68 0.000 .2772749 .5074922
_Imales_age_5 | .4593065 .0677852 -5.27 0.000 .3389212 .6224529
_Imales_age_6 | .3305909 .058585 -6.25 0.000 .2295005 .4762097
_Imales_age_7 | .2424751 .0569683 -6.03 0.000 .1494589 .3933804
_Imales_age_8 | .2505148 .0610396 -5.68 0.000 .1516685 .4137817
_Imales_age_9 | .1784419 .0280643 -10.96 0.000 .1290696 .2467005
_Iwealthind_2 | 1.005121 .124558 0.04 0.967 .7787094 1.297362
_Iwealthind_3 | 1.008603 .116643 0.07 0.942 .7948389 1.279856
_Iwealthind_4 | 1.20144 .1421264 1.55 0.133 .9416558 1.532894
_Iwealthind_5 | 1.663429 .2167442 3.91 0.001 1.271915 2.175456
_Ihivstatus_1 | .5163145 .0615363 -5.55 0.000 .4039345 .6599602
_Ihivstatus_2 | 1.024993 .1377831 0.18 0.856 .7771174 1.351934
_Ieverteste_1 | 5.013099 .6252474 12.93 0.000 3.877471 6.481329
_IRegion_2 | .9098206 .1090208 -0.79 0.438 .7108488 1.164486
_IRegion_3 | .717331 .1125807 -2.12 0.044 .5192085 .9910542
_IRegion_4 | .8954747 .1039154 -0.95 0.351 .7051106 1.137233
_cons | .2204909 .036229 -9.20 0.000 .1571896 .3092842
-------------------------------------------------------------------------------------
Note: _cons estimates baseline odds.

. svylogitgof
option over() not allowed
r(198);

end of do-file

r(198);

Diff-in-Diff with log or non-log prices

$
0
0
Hi,

I have conducted a Difference-in-Difference regression to study the impact of establishing of a factory on the housing prices in a certain region. I have used log-prices as my dependent variable. I get parallel trends, and a significant effect post-treatment.

However, when doing the same DID-regression with non-log prices, I get parallel trends, but no significant effect post-treatment.

Should I include both of these regressions in my thesis for the sake of transparency and discussion? Or should I use only the one with logged prices, since similar studies often only look at logged prices?

//Oliver

multilevel meta-analysis

$
0
0
Can a multilevel meta-analysis be conducted when published articles for some studies contain varying levels of detail while others do not? It is my understanding that the lesser levels can be aggregated to the higher level and a standard meta-analysis can be conducted. However, is it possible to incorporate such data into STATA?

Bug in Python indentation?

$
0
0
My Python-containing ado programs that ran before now give an r(7102); error: IndentationError: expected an indented block . . .. I wrote them last year in Release 17.0, but I thought that I'd used them as recently as earlier this summer (i.e., in the current release) without any trouble. When I went to use one today I discovered this new (mis)behavior.

I can get simple function definitions to run by using a reverse solidus and semicolon fix-up as shown in this post from April this year. For example, this
Code:
*! indent_bug.ado
program define indent_bug
    version 18.0
    syntax

    python: test()
    
end

version 18.0

python:

def test():
    print("Here.")
    print(". . . and Here.")

end
gives the error, but this
Code:
*! indent_bugbreaks.ado
program define indent_bugbreaks
    version 18.0
    syntax

    python: test()

end

version 18.0

python:

def test(): \
    print("Here."); \
    print(". . . and Here.")

end
runs without error.

Unfortunately, slightly more complicated code, such as
Code:
import json
def search(keyword):
    with open("F:/" + keyword + ".json", "r") as f:
        response = json.load(f)
    return(response)
still gives an error (although a different one) even with the reverse solidus-semicolon workaround. (That code snippet works as expected in my Python environment. Excuse the unusual choice of names—it's a test harness for another function that will call search() to access a search engine API.)

I've attached three files, indent_bug.ado, indent_bugbreaks.ado and test.py,* whiich I called in sequence from the command line
Code:
indent_bug
indent_bugbreaks
python script test.py, global
python: test()
Only the first ado file shows the error. The resulting output is in the log file indent_bug.smcl, which is also attached.

Is anyone else experiencing this or is it something peculiar to my setup? Or am I misunderstanding something? I've Googled python indentation stata but haven't turned up much on the topic, excepting that thread linked to above.

* test.py contains the same code as in the auxiliary Python section of indent_bug.ado (shown in the first code block above). I've had to append a .txt extension to test.py in order to circumvent the forum's limitations on attachments' file-name extensions. To run the relevant line in the code just above, remove the second file name suffix (rename test.py.txttest.py after downloading).

Predicted probabilities with a continuous x continuous interaction in a multilevel model

$
0
0
Hello, everyone. I am currently working with cross-sectional survey data from seven countries, comprising 35,893 observations. My dependent variable is "satisfaction with democracy" (categorized into four categories), and my independent variable is "populism attitudes." Additionally, I am considering the moderating variable, "corruption levels."

The model includes an interaction between an individual-level continuous predictor.

I would like to calculate the marginal effects and predicted probabilities of populism attitudes and satisfaction with democracy based on different levels of corruption.

However, I am facing challenges in finding the appropriate information on how to calculate a continuous-by-continuous interaction in a multilevel model.

Here is my code.

I hope these revisions help improve the clarity and readability of your text.


Code:
        
meologit  demo_satisfy c.Populism c.corruption///  
        c.Populism#c.corruption///
        $control ///
        || n_country:  ,  diff

margins , dydx(Populism) at(corruption=(0.08(0.05)0.79)) expression(predict (outcome(4) mu fixed) + predict(outcome(3) mu fixed)) vsquish atmean
# satisfaction with democracy is predicted by combining 3 (satisfied) and 4 (very satisfied)


------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
Populism   |
         _at |
          1  |   .2285128   .0448994     5.09   0.000     .1405116     .316514
          2  |    .226162   .0380684     5.94   0.000     .1515493    .3007747
          3  |   .2232461   .0318626     7.01   0.000     .1607965    .2856957
          4  |   .2197591   .0263693     8.33   0.000     .1680763    .2714419
          5  |   .2156988   .0216909     9.94   0.000     .1731855    .2582121
          6  |   .2110671   .0179506    11.76   0.000     .1758845    .2462496
          7  |   .2058698   .0152808    13.47   0.000       .17592    .2358197
          8  |   .2001175   .0137635    14.54   0.000     .1731415    .2270934
          9  |   .1938246   .0133329    14.54   0.000     .1676925    .2199567
         10  |   .1870103   .0137537    13.60   0.000     .1600535     .213967
         11  |   .1796978   .0147392    12.19   0.000     .1508096     .208586
         12  |   .1719148   .0160741    10.70   0.000     .1404101    .2034195
         13  |   .1636927   .0176435     9.28   0.000     .1291121    .1982734
         14  |   .1550669   .0194036     7.99   0.000     .1170365    .1930973
         15  |   .1460762   .0213452     6.84   0.000     .1042402    .1879121
------------------------------------------------------------------------------
I'm not sure "margins , dydx(Populism) at(corruption=(0.08(0.05)0.79)) expression(predict (outcome(4) mu fixed) + predict(outcome(3) mu fixed)) vsquish atmean" is predicted probabilities or marginal effects.

I really tried to do this correctly, however I apologize if this is too wordy, too vague, or difficult to read. Thank you in advance.

.dta file corrupt. The file unexpectedly ended before it should have.- Problem

$
0
0
Hi. New to stata.
I imported a file using:

import sas "filepath", clear case(lower)

and then saved it using:

save "filepath", replace.

Then I tried to use the file using:

clear all
use"filepath"

and got this result:

.dta file corrupt
The file unexpectedly ended before it should have.

I'm clueless, any advice on the next steps. Thanks

Placebo test for DID analysis

$
0
0
Hello!
I'm conducting a project utilizing a Difference-in-Differences (DiD) framework to investigate the effects of a policy implemented in a staggered manner across various states and years.
Below is my regression model and the associated code. Now I want to perform a placebo test grounded on this baseline regression, but I don't know how to do that in Stata. Thanks!

y=a1*treat*post+controls+FE
code: reghdfey treat*post $controls, absorb($fixed_effects) vce(cluster fips)

Superimposing median line/point in a twoway kdensity plot

$
0
0
Hi there,

I am having issues with a twoway kdensity plot on which I would like to superimpose either a line or a point representing the median (by country).

I am working with Stata 18 and my dataset is a subset of the ESS (European Social Survey).

Following what has been said in other posts on the topic I coded the following:

Code:
 egen median_Redistribution_norm=median(Redistribution_norm), by(cntry) /// to create a variable containing the median value for my variable "Redistribution_norm" for each country
Then, I am stuck with the following twoway kdensity code where I do not know to add the abovementioned value (let's say in the form of a line):

Code:
 twoway kdensity Red, bw(0.2) title(Redistribution Policy) range(0 1) xlabel(0 "Left" 1 "Right") by(cntry)
I previously tried to use the xline() graph option, yet it appears to me it is only worth it when you insert the exact value.

Thanks in advance

Mattia

Monotone instrumental variables

$
0
0
Hello everyone,

I am running a regression model with instrumental variables to see the effect that intimate partner violence has on a woman's decision to have a job. The proposed instrument is the early exposure of adult couples to domestic violence, given that domestic violence is often the product of an intergenerational experience of violence at home.

However, my instrument is not that strong, so I want to use the monotonic instrumental variables technique, but I don't know what the command is in stata. Maybe you can help me with that please.

This is the command I am using:

ivregress 2sls trabaja urbana i.region edad edad2 mestizo i.educacion_cat Hijos_convive Indice_hogarf2 rol_poder rol_genero Independencia Dispone_dinero_gastos (vp_casadas = antecedentes), vce(robust) first

Binscatter and Confidence Interval

$
0
0
Is any way you can add Confidence Interval in this code? So you can see it in the graph.

binscatter cotiza_pension TT, line(connect) xline(0)

Thanks!
José

Array

Is there a way to know if a ID string variable is the same between two different datasets?

$
0
0
Hi everyone,

I have a quick question. I'd like to know if the IDs of two datasets are different in a more efficient way than mine (possibly my way is wrong...):

I have two datasets with the same identifier variable name: "id".

So here are my steps:

(1) I open the first dataset and type in the command set -isid id-

(2) I don't get any error messages, so everything's fine: the id variable uniquely identifies each observation.

(3) I open dataset 2. I carry out exactly the same procedure as in (2). No error message either.

(4) To see if the two IDs are identical, I run the following command. Let's imagine that my master data is "dataset_1" and my using dataset is "dataset_2".

Code:
merge 1:1 id using dataset_2
I get this for example:
Code:
. merge 1:1 id using dataset_2

    Result                      Number of obs
    -----------------------------------------
    Not matched                     5,808,563
        from master                 2,579,289  (_merge==1)
        from using                  3,229,274  (_merge==2)

    Matched                                 0  (_merge==3)
    -----------------------------------------

.
Can I deduce from this that the two datasets have different IDs? Is there a better way of observing this than this?
I say this because I need to compare a file with other files. The other files are monthly and run from January 2021 to July 2023... So the procedure would be very energy and time-consuming

Thanks a lot.
Best,
Michael

calculating relative risk ratio from logistic regression

$
0
0
Hi I was initially calculating odds ratios for the following:
Where my outcome variable sepsis - is a binary variable as 0 or 1
i.procedure is also a binary variable as 0, 1

Code:
logit sepsis i.procedure_type comorbidity  [pw=_weight], or
I now want to calculate a RR

Is it correct to use the following - just making sure:
Code:
mlogit sepsis i.procedure_type comorbidity  [pw=_weight], rrr

generating a variable based on values of another variable

$
0
0
I am trying to generate a variable that gives an instructor ID for a student if that student took a particular course that term with that instructor, but there are a few aspects of the problem that make it more complicated than I had originally realized, and this is now beyond my Stata variable generation knowledge.

It would be easy to do if each student only ever enrolled in one section of a course in a given term, and never switched sections; however, the problem arises when there is more than one section of the course in which the student was enrolled at some point in the term, so I have to generate this new instructor variable using a set of conditions that is more complicated than I have been able to figure out how to implement so far.

Here is a simplified version of the problem. Here are the variables I have:
--studentID
--term
--coursenumber
--instructor
--sortvariable (missing in some cases where I don't want an instructor name assigned to the new variable)
--maxsortvariable (this is the maximum value of sortvariable for this studentID for this term for this coursenumber)

I want to generate a new variable for each of five courses (let's say course numbers 100, 101, 102, etc.) called instructor100, instructor 101, etc. This process with be the same with each course, so let's just look at one example:

We want to generate a new variable called instructor100 so that:
For each case, if this student did not take coursenumber 100 in this term; or if sortvariable (and therefore maxsortvariable) is empty for all cases in which coursenumber==100 for this student in this term, instructor100 should be empty.
Otherwise, instructor100 should be set to the value of instructor for which sortvariable has the maximum possible value when coursenumber==100 in this term for this student.
If there are multiple cases where sortvariable has the maximum possible value when coursenumber==100 in this term for this student, then we just want one of the instructor values assigned at random to instructor100 (but instructor100 should have the same value for all cases with the same studentID and term). There are not a lot of cases of this, but if it happens, we need a way to deal with it.

So, for example, something like this:
studentID term coursenumber instructor sortvariable maxsortvariable instructor100
1 1199 100 10 1 1.5 11
1 1199 100 11 1.5 1.5 11
1 1199 200 12 2 2 .
1 1202 300 13 2.5 2.5 .
1 1206 100 14 3 3 14
1 1206 100 15 3 3 14
1 1209 100 12 . . .
For term 1206, instructor=14 has been chosen for the value of instructor100 at random, but it is the same random choice for ALL cases with the same studentID and term number (i.e., these either both need to be 14 or both need to be 15, but not one 14 and one 15). This random choice condition I can do through brute force at the end if necessary (by dropping duplicates by studentID term coursenumber and telling Stata to force drop even though data will be lost), so I can do that if it is necessary (or makes the variable generation easier to write out elegantly).

Thanks in advance for any advice!

Gmm

$
0
0
Hello, i wnat to do this exercise on stata but i'm just starting using stata so i am really struggling if someone could help it would be really helpful.

Based on Equation log(wages)i = β0+β1Educi+β2Experi+β3Exper2 i +β3Genderi+β4Genderi×Educi+ui, and the choice of instruments that i have already done, present the generalized method of moment (GMM) estimation principle. You should indicate the number of moments that you will use, state the GMM criteria function and explain the role of the so-called weight matrix (you do not have to derive the estimator)

Thanks in advanced

How to merge in a dataset with duplicate ID's

$
0
0
Hi everyone,

This is my first post to Statalist, so please forgive any errors. I have a large, registry based dataset (1 million+ observations) that has all information regarding pregnancies for a large European country. Separately, I have a dataset including all patient visits to a health practitioner, including diagnoses. I would like to combine these two datasets to have all diagnoses linked to each mother in the first dataset.

The problem that I am running into is that both datasets have multiple entries for the same ID number. The pregnancy registry has multiple pregnancies per woman, and the patient visit dataset includes all visits for each ID number. When I try to merge using 1:1, m:1, or 1:m using ID number I get a notification that the ID number does not uniquely identify observations (true for both datasets). Following the advice of others on Statalist, I used the joinby command, which seems to have linked my datasets properly, but now includes many more observations than I am looking for.

I am hoping to keep each pregnancy as a separate observation, but in this new set combined with joinby, there are multiple observations per ID number that correspond to separate patient visits. Please see the example below:

Dataset 1:

ID Number Year (of birth) Var1 Var 2 Var 3 Var4 ....
1 2008 1 1 1 2
2 2009 3 1 2 1
3 2008 2 1 3 1
4 2009 1 2 2 2
5 2007 2 3 1 1
5 2010 3 4 1 1
6 2007 2 2 2 2

Dataset 2:
ID Number Year (of diagnosis) Diagnosis
1 2008 1
1 2008 3
1 2008 2
2 2009 1
2 2009 2
3 2010 3
3 2010 2

I would like the data to look like this:

ID Number Year (of birth) Var1 Var 2 Var 3 Var4 Diagnosis 1 Diagnosis 2 Diagnosis 3...
1 2008 1 1 1 2 1 3 2
2 2009 3 1 2 1 1 2 .
3 2008 2 1 3 1 3 2 .
4 2009 1 2 2 2 . . .
5 2007 2 3 1 1 . . .
5 2010 3 4 1 1 . . .
6 2007 2 2 2 2 . . .


Please let me know if you have any advice!

Thank you, Nicole

How to add error bar to bar chart?

$
0
0

Hi, I am new to the lovely forum. I want to add error bar at the top of each of the bars in this graph. How can I do it?
Thanks in advance for the help.
Array

Diff-in-Diff with log or non-log prices

$
0
0
Hi,

I have conducted a Difference-in-Difference regression to study the impact of establishing of a factory on the housing prices in a certain region. I have used log-prices as my dependent variable. I get parallel trends, and a significant effect post-treatment.

However, when doing the same DID-regression with non-log prices, I get parallel trends, but no significant effect post-treatment.

Should I include both of these regressions in my thesis for the sake of transparency and discussion? Or should I use only the one with logged prices, since similar studies often only look at logged prices?

//Oliver

Handling data from various years and various time interval for each individual

$
0
0
Hello, I would like to know the detailed method of handling data from various years and various time intervals for each individual.
I aim to construct a new dataset where each individual is paired with a specific test name with specified visit intervals.

The lab tests were conducted on different dates for each ID, with a substantial amount of data generated for each date.
I intend to assign implementation dates to each ID as "visit1", "visit2", and so forth.
Additionally, I want to associate a lab testname with each visit and consolidate the results.

For example, I would like to create variables like "visit1_labtestA", "visit1_labtestB", "visit2_labtestA", and "visit2_labtest B" for each ID.
These variables will contain the respective lab values.

My question is whether it's possible to merge this new dataset in STATA and analyze the effect of the lab test results (such as A or B) within different time intervals on individual outcomes.

ID Lab date Lab test name Lab value
1 2011-01-05 A 0.5
1 2011-01-05 B 0.7
1 2011-02-05 A 0.8
1 2011-02-05 B 0.3
2 2010-01-05 A 1.2
2 2010-01-05 B 1.4
2 2010-04-05 A 1.6
2 2010-04-05 B 1.8
3 2012-01-05 A 0.6
3 2012-02-05 B 0.4
3 2013-03-05 A 0.5
3 2013-04-05 B 0.3
4 2014-01-05 A 0.2
4 2014-02-05 B 0.1
to a new table
ID visit1_A visit1_B visit2_A visit2_B
1 0.5 0.7 0.8 0.3
2 1.2 1.4 1.6 1.8
3 0.6 0.4 0.5 0.3
4 0.2 0.1

Exporting with .pdf changes the font for graphs

$
0
0
Hello,
I have a problem with the fonts Catenerao and CMU Serif installed on my Windows machine & in Sttata (version 18).
Even though I can see the fonts reflected in my graphs in the state graph window, I can not export the graph with the font (fur exampple from CMU Serif to times newo roman). When I save it, it changes to something else.

I tried the "graph set window fontface" command, tried re-install the fonts for all-users in my windows, and tried different pdf readars. but, it is not working.

Is there a person herre who experienceses the same problblem?

Thank you so much
Viewing all 65017 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>