Tuesday, January 29, 2019

Install TeXStudio and Dropbox on Ubuntu

1. Install TeXStutio

open terminal, and type in:

sudo add-apt-repository ppa:sunderme/texstudio

(if old version already installed: sudo apt-get remove texstudio-d)

sudo apt-get update

sudo apt-get install texstudio

2. Dropbox (permission issues)

sudo chmod +s /usr/lib/policykit-1/polkit-agent-helper-1

sudo dpkg --configure -a





Tuesday, June 17, 2014

Possibly best way to output statistic summaries

estpost sum wheat_2_value wheat_7_value wheat_12_value hay_1_value if xxxx==xxx
esttab using ss.rtf, cells("mean(fmt(2)) sd(fmt(2)) min(fmt(1)) max(fmt(0))") nomtitle nonumber replace label

Source: http://21cresearcher.blogspot.com/2009/09/stata-how-to-export-descriptive.html

How to let outsum work on Stata 12

Outsum package is now incompatible with newest versions of Stata, i.e. version 12 and version 13.  However, you can do some modification to the outsum ado file to let it work in new versions.  To do this, follow the below steps.

1. Type "which outsum" and run it to locate path of outsum.ado file, usually it is supposed to be "c:\ado\plus\o\outsum.ado".  Then copy the path.

2.  Type "doedit [paste your outsum.ado path here, not brackets]" and run it, a code editing window will pop out.

3. Press Ctrl+H to open replace window.  find  all "_all" and replace them with "*". Then save the file.

4. run "cscript" to clean the memory and reload all ado files.

Now outsum will work in later versions of Stata

Friday, November 29, 2013

Some thoughts on large data processing

First be ready for digit/string tricks.

Stata recommended.  SAS sucks.

Why SAS sucks?  Will do a separate post to discuss it.

Need to have a full license of StatTransfer

Be ready to compress data using compress command in Stata.

Use a codebook.

Do not try to append all the data files into one file.  Should put them within one folder and create global code to index them.

Sunday, September 8, 2013

Stata Data Merge: Append and Merge

Simply put,

"Append" is for vertical data combination.

"Merge" is for horizontal data combination.

For merge, always choose the bigger data set as in memory and merge a smaller data set, which can save some time. 

PS: also choose many:many option.

Saturday, June 9, 2012

Some thoughts on robust regression (1)

A parameter can have several estimators.  Among these, the "best estimators" are always assumed to have some special features that can be exploited.  For an instance, the CLR model has several special assumptions, and if some of the properties are allowed to deviate from these assumptions, new estimation techniques must be applied, such as introducing dummy variables or 2sls.

Since the "best" estimators are designed to use these assumptions, if some of the assumptions are violated, the "best" estimators "get hurt" more seriously than other estimators.  Thus two kinds of special estimators should be considered and we call them "robust" estimators.  One kind of these estimators is an estimator that is not sensitive to the violations of the assumptions. We could also think about a very common situation where we are not able to know if any of the assumptions are violated because of limited information or sample size.  At this point, we may prefer a less "best" estimator that are neither as good as the "best" ones, nor sensitive to the violations of assumptions, i.e., some violations of the assumptions that a estimator may have to suffer does not screw up all the models.  Or it can be considered as "least worst" estimator.  The first kind of robust, i.e. insensitive estimator, is very common, for an example,the OLS estimator itself is considered as a robust estimator.

Another kind of robust estimator is designed to resist the violations.  I believe if you have taken a intermediate level econometric training, you must know such estimators, but you may not know it is called "robust estimator".  The most commonly seen "robust" estimator is the ones used to correct heteroscedasticity.  One example is about the structural changes.  We cannot estimated the var-cov matrix directly.  We have to estimated two var-cov matrices and using some kind of weigh techniques to jointly determine the estimate of var-cov matrix.  This example was seen in an exam of my masters econometric class.

To be continued...

Monday, May 14, 2012

How to summarize the frequency of selected data in Excel

Admitting that the research I am currently doing is totally worthless, I concluded the way to summarize the frequency of selected data in Excel.

1. Generating criteria

For example, you believe in the data set, there are only several unique values such as "1, 2, 4, 5" though the data set is very large.  Then create the column of these for unique values.

2. Using functions

Use "=frequent(data range, criteria range)" and select corresponding data and criteria range.

3. Aftermath

First select enough number of continuous empty cells.  By enough, it means the the number of the empty cells should be equal to the number of the unique values-1 (the first cell is just where the frequent function is input in step2).

Second, press F2

Third, press "Shift" + "Control" + "Enter".

All is done now.