SMHS LinearModeling StatsSoftware

From SOCR
Revision as of 12:18, 21 May 2016 by Dinov (talk | contribs) (SMHS Linear Modeling - Statistical Software)
Jump to: navigation, search

SMHS Linear Modeling - Statistical Software

This section briefly describes the pros and cons of different statistical software platforms.

Statistical Software Advantages Disadvantages
R
  • R is actively maintained (100,000 developers, 15K packages)
  • Excellent connectivity to various types of data and other systems
  • Versatile for solving problems in many domains
  • It’s free, open-source code
  • Anybody can access/review/extend the source code
  • R is very stable and reliable
  • If you change or redistribute the R source code, you have to make those changes available for anybody else to use
  • R runs anywhere (platform agnostic)
  • Extensibility: R supports extensions, e.g., for data manipulation, statistical modeling, and graphics
  • Active and engaged community supports R
  • Unparalleled question-and-answer (Q&A) websites
  • R connects with other languages(Java/C/JavaScript/Python/Fortran) & database systems, and other programs, SAS, SPSS, etc.
  • Other packages have add-ons to connect with R. SPSS has incorporated a link to R, and SAS has protocols to move data and graphics between the two packages
  • Mostly scripting language
  • Steeper learning curve
[http:/www.sas.com SAS]
  • Large datasets
  • Commonly used in business & Government
  • Expensive
  • Somewhat dated programming language
  • Expensive/proprietary
Stata
  • Easy statistical analyses
  • Mostly classical stats
SPSS
  • Appropriate for beginners
  • Simple interfaces
  • weak in more cutting edge statistical procedures lacking in robust methods and survey methods
More comparisons are available online: UCLA/ATS and Wikipedia.

GoogleScholar Research Article Pubs

Year R SAS SPSS
1995 8 8620 6450
1996 2 8670 7600
1997 6 10100 9930
1998 13 10900 14300
1999 26 12500 24300
2000 51 16800 42300
2001 133 22700 68400
2002 286 28100 88400
2003 627 40300 78600
2004 1180 51400 137000
2005 2180 58500 147000
2006 3430 64400 142000
2007 5060 62700 131000
2008 6960 59800 116000
2009 9220 52800 61400
2010 11300 43000 44500
2011 14600 32100 32000
require(ggplot2)
require(reshape)
Data_R_SAS_SPSS_Pubs <-read.csv('https://umich.instructure.com/files/522067/download?download_frd=1', header=T)
df <- data.frame(Data_R_SAS_SPSS_Pubs) 
# convert to long format
df <- melt(df ,  id.vars = 'Year', variable.name = 'Time') 
ggplot(data=df, aes(x=Year, y=value, colour=variable, group = variable)) +  geom_line() + geom_line(size=4) + labs(x='Year', y='Citations')

SMHS LinearModeling Fig002.png

Next see

Quality Control section for a discussion of data Quality Control (QC) and Quality Assurance (QA) which represent important components of data-driven modeling, analytics and visualization.





Translate this page:

(default)
Uk flag.gif

Deutsch
De flag.gif

Español
Es flag.gif

Français
Fr flag.gif

Italiano
It flag.gif

Português
Pt flag.gif

日本語
Jp flag.gif

България
Bg flag.gif

الامارات العربية المتحدة
Ae flag.gif

Suomi
Fi flag.gif

इस भाषा में
In flag.gif

Norge
No flag.png

한국어
Kr flag.gif

中文
Cn flag.gif

繁体中文
Cn flag.gif

Русский
Ru flag.gif

Nederlands
Nl flag.gif

Ελληνικά
Gr flag.gif

Hrvatska
Hr flag.gif

Česká republika
Cz flag.gif

Danmark
Dk flag.gif

Polska
Pl flag.png

România
Ro flag.png

Sverige
Se flag.gif