SMHS LinearModeling StatsSoftware

From SOCR
Revision as of 12:16, 21 May 2016 by Dinov (talk | contribs) (SMHS Linear Modeling - Statistical Software)
Jump to: navigation, search

SMHS Linear Modeling - Statistical Software

This section briefly describes the pros and cons of different statistical software platforms.

Statistical Software Advantages Disadvantages
R
  • R is actively maintained (100,000 developers, 15K packages)
  • Excellent connectivity to various types of data and other systems
  • Versatile for solving problems in many domains
  • It’s free, open-source code
  • Anybody can access/review/extend the source code
  • R is very stable and reliable
  • If you change or redistribute the R source code, you have to make those changes available for anybody else to use
  • R runs anywhere (platform agnostic)
  • Extensibility: R supports extensions, e.g., for data manipulation, statistical modeling, and graphics
  • Active and engaged community supports R
  • Unparalleled question-and-answer (Q&A) websites
  • R connects with other languages(Java/C/JavaScript/Python/Fortran) & database systems, and other programs, SAS, SPSS, etc.
  • Other packages have add-ons to connect with R. SPSS has incorporated a link to R, and SAS has protocols to move data and graphics between the two packages
  • Mostly scripting language
  • Steeper learning curve
[http:/www.sas.com SAS]
  • Large datasets
  • Commonly used in business & Government
  • Expensive
  • Somewhat dated programming language
  • Expensive/proprietary
Stata
  • Easy statistical analyses
  • Mostly classical stats
  • SPSS
    • Appropriate for beginners
    • Simple interfaces
    • weak in more cutting edge statistical procedures lacking in robust methods and survey methods
    More comparisons are available online: UCLA/ATS and Wikipedia.

    GoogleScholar Research Article Pubs

    Year R SAS SPSS
    1995 8 8620 6450
    1996 2 8670 7600
    1997 6 10100 9930
    1998 13 10900 14300
    1999 26 12500 24300
    2000 51 16800 42300
    2001 133 22700 68400
    2002 286 28100 88400
    2003 627 40300 78600
    2004 1180 51400 137000
    2005 2180 58500 147000
    2006 3430 64400 142000
    2007 5060 62700 131000
    2008 6960 59800 116000
    2009 9220 52800 61400
    2010 11300 43000 44500
    2011 14600 32100 32000
    require(ggplot2)
    require(reshape)
    Data_R_SAS_SPSS_Pubs <-read.csv('https://umich.instructure.com/files/522067/download?download_frd=1', header=T)
    df <- data.frame(Data_R_SAS_SPSS_Pubs) 
    # convert to long format
    df <- melt(df ,  id.vars = 'Year', variable.name = 'Time') 
    ggplot(data=df, aes(x=Year, y=value, colour=variable, group = variable)) +  geom_line() + geom_line(size=4) + labs(x='Year', y='Citations')
    

    SMHS LinearModeling Fig002.png

    Next see

    Quality Control section for a discussion of data Quality Control (QC) and Quality Assurance (QA) which represent important components of data-driven modeling, analytics and visualization.





    Translate this page:

    (default)
    Uk flag.gif

    Deutsch
    De flag.gif

    Español
    Es flag.gif

    Français
    Fr flag.gif

    Italiano
    It flag.gif

    Português
    Pt flag.gif

    日本語
    Jp flag.gif

    България
    Bg flag.gif

    الامارات العربية المتحدة
    Ae flag.gif

    Suomi
    Fi flag.gif

    इस भाषा में
    In flag.gif

    Norge
    No flag.png

    한국어
    Kr flag.gif

    中文
    Cn flag.gif

    繁体中文
    Cn flag.gif

    Русский
    Ru flag.gif

    Nederlands
    Nl flag.gif

    Ελληνικά
    Gr flag.gif

    Hrvatska
    Hr flag.gif

    Česká republika
    Cz flag.gif

    Danmark
    Dk flag.gif

    Polska
    Pl flag.png

    România
    Ro flag.png

    Sverige
    Se flag.gif