We want models with a few agents, rather than those with only one
or two or inﬁnitely many.
We want to understand agents that are neither extremely brilliant nor extremely stupid, but rather live somewhere in the middle.
It is the interest in between stasis and utter chaos. The world tends not
to be completely frozen or random, but rather it exists in between these
two states. It is the interest in between control and anarchy. We ﬁnd robust
patterns of organization and activity in systems that have no central
control or authority.
It is the interest in between the continuous and the discrete.
We have corporations and human bodies that maintain a recognizable form and activity over long periods of time, even though their constituent parts exist on time scales that are orders of magnitude less long lived.
It is the interest in between the continuous and the discrete. The
behavior of systems as we transition between the continuous and discrete
is often surprising. Many systems do not smoothly move between these
two realms, but instead exhibit quite different patterns of behavior, even
though from the outside they seem so “close.”
my interpretation of Page 7
In a complicated system, the various elements that make up
the system maintain a degree of independence from one another such that
removing one element does not fundamentally alter the system’s behavior apart from that which directly resulted from the piece that was removed.
In a complex system, the dependencies among the elements is such that removing one element changes system behavior to an extent that goes well beyond what is embodied by the particular element that is removed.
Complex systems can be fragile. They can also exhibit an unusual degree of robustness to less radical changes in their component parts as a result of a very powerful organizing force that can overcome a variety of changes to the lower-level components.
my interpretation of Page 9
Social agents must predict and react to the actions and predictions of other agents. (p.10)
Rubin causal model
Information theoretical choice among statistical models
commonly used rule-of-thumb, that states two models are indistinguishable by AIC criterion if the difference |AIC1−AIC2|<2.
As a rough rule of thumb, models having their AIC within 1–2 of the minimum have substantial support and should receive consideration in making inferences. Models having their AIC within about 4–7 of the minimum have considerably less support, while models with their AIC>10 above the minimum have either essentially no support and might be omitted from further consideration or at least fail to explain some substantial structural variation in the data.
Denote the AIC values of the candidate models by AIC1, AIC2,AIC3,…,AICR. Let AICmin denotes the minimum of those values. Then
e(AICmin−AICi)/2 can be interpreted as the relative probability that the ith model minimizes the (expected estimated) information loss.
As an example, suppose that there were three models in the candidate set, with AIC values 100, 102, and 110.
Then the second model is e(100−102)/2=0.368 times as probable as the first model to minimize the information loss,
and the third model is e(100−110)/2=0.007 times as probable as the first model to minimize the information loss.
In this case, we might omit the third model from further consideration and take a weighted average of the first two models, with weights 1 and 0.368, respectively. Statistical inference would then be based on the weighted multimodel.
AIC is less preferable for large-scale data sets.
In addition to BIC you may find useful to apply bias-corrected version of AIC criterion AICc (you may use this
R code or use the formula AICc=AIC+2p(p+1)n−p−1, where p is the number of estimated parameters).
Rule-of-thumb will be the same.
One can not compare two models if they do not model the same variable
AIC should work when comparing both nested and nonnested models.
A Gaussian log-likelihood is given by: log(L(θ))=−|D|2log(2∗π)−12log(|K|)−12(x−μ)TK−1(x−μ), K being the covariance structure of your model, |D| being the number of points in your datasets, μ the mean response and obviously x being your dependent variable.
AIC is calculated to be equal to 2k−2log(L), where k is the number of fixed effect in your model and L your likelihood function .
It practically compares trade-off between variance(2k) and bias (2log(L)) in your modelling assumptions.
When you calculate your log-likelihood practically you look at two terms: A fit term, denoted by −12(x−μ)TK−1(x−μ) and a complexity penalization term, denoted by −12log(|K|).
Aside wikipedia AIC is also defined to equate: |D|∗log(RSS|D|)+2∗k ; this form makes it even more obvious why different models with different dependent variable are not comparable. The RSS in the two case is just incomparable between the two.
AIC is based on KL divergence (difference between two distributions roughly speaking) and works its way on proving how you can approximate the unknown true distribution of your data and compare that to the distribution of the data your model assumes. That’s why “smaller AIC score is better”; you are closer to the approximate true distribution of your data.
using AIC :
- You can not use it to compare models of different data sets.
- You should use the same response variables for all the candidate models.
- You should have |D|>>k, because otherwise you do not get good asymptotic consistency.
Akaike Information Criterion, Shuhua Hu, (Presentation p.17-18)
Applied Multivariate Statistical Analysis, Johnson & Wichern, 6th Ed. (p. 386-387)
A new look at the statistical model identification, H. Akaike, IEEE Transactions on Automatic Control 19 (6): 716–723 (1974)
Model Selection Tutorial #1: Akaike’s Information Criterion, D. Schmidt and E. Makalic, (Presentation p.39)