S
The pioneering statistical computing language from Bell Labs that reshaped data analysis and gave rise to R and S-PLUS
Created by John Chambers, Rick Becker, and Allan Wilks (Bell Laboratories)
S is a programming language and interactive environment for statistical computing, data analysis, and graphics, developed at Bell Laboratories beginning in 1976. Created primarily by John Chambers, together with Rick Becker, Allan Wilks, and other members of the Bell Labs statistics research group, S introduced an approach to data analysis that emphasized interactivity, extensibility, and the treatment of data and statistical methods as first-class programmable objects. Though largely superseded today by its open-source descendant R, S remains one of the most historically influential languages in the field of statistical computing.
History & Origins
The roots of S trace back to a series of meetings in the spring of 1976, when John Chambers, Rick Becker, Doug Dunn, and others at Bell Laboratories in Murray Hill, New Jersey, discussed how to build a more interactive alternative to the Fortran subroutine libraries that statisticians of the era relied upon. The motivation was strongly shaped by the exploratory data analysis movement championed by John Tukey, which prized looking at data flexibly and iteratively rather than running rigid, pre-planned batch computations.
The first working version of S was implemented in 1976 on the Honeywell GCOS operating system, with much of the early implementation done by Becker, Chambers, Dunn, and colleagues. Around 1979 the system was ported to UNIX—the operating system also being developed at Bell Labs—which made it far more portable and accessible. S began to spread beyond Bell Labs starting in 1980, initially to universities, and source licensing through AT&T followed in the mid-1980s.
The name “S” was chosen in the spirit of the C language (also a Bell Labs creation), with the single letter standing loosely for “statistics.”
Design Philosophy
S was designed around the idea that a statistician should be able to move fluidly between using a system interactively and programming it. As Chambers later described, the goal was to let users “turn ideas into software, quickly and faithfully.” Several principles defined the language:
- Interactivity first: Users could type expressions at a prompt and immediately see results or graphics, encouraging an iterative, exploratory style of analysis.
- Everything is an object: Data, functions, and results were all objects that could be inspected, stored, and passed around, long before this was common in mainstream languages.
- Extensibility: New statistical methods could be added by writing functions in S itself, rather than requiring users to drop down to a lower-level language.
- A gentle on-ramp with depth underneath: Casual users could accomplish a great deal with simple commands, while advanced users could write sophisticated programs.
Key Features
S brought together a number of capabilities that, taken together, were unusual for its time:
- Rich data structures including vectors, matrices, lists, and later data frames for tabular data.
- A functional programming style in which functions are values and computations are expressed as function calls operating on whole objects rather than element-by-element loops.
- Integrated graphics, treating visualization as a core part of data analysis rather than an afterthought.
- Model formula notation (introduced in the “White Book” era), allowing models to be specified concisely with expressions such as
y ~ x1 + x2. - Object-oriented class systems: the lightweight S3 system and the later, more formal S4 system, which provided classes, generic functions, and method dispatch.
Evolution
S evolved through several distinct generations, each documented by a book that the community came to know by the color of its cover:
| Era | Book | Year | Contribution |
|---|---|---|---|
| Original S | S: An Interactive Environment for Data Analysis and Graphics (“Brown Book”) | 1984 | Documented the early interactive system |
| New S | The New S Language (“Blue Book”) | 1988 | Redesigned the language around functions and a cleaner object model |
| Statistical Models | Statistical Models in S (“White Book”) | 1992 | Introduced model formulas and the S3 class system |
| S4 | Programming with Data (“Green Book”) | 1998 | Introduced the formal S4 class and methods system |
In parallel with these academic and research developments, a commercial implementation called S-PLUS was released in 1988 by Statistical Sciences, Inc. S-PLUS added a supported, packaged product around the S language and became widely used in industry. Over the following decades its ownership passed through several companies, with TIBCO Software acquiring Insightful Corporation—the then-owner of S-PLUS—in 2008.
Current Relevance
The most significant chapter in S’s legacy is R. Begun in 1993 by Ross Ihaka and Robert Gentleman at the University of Auckland and released as free software later in the 1990s, R was designed as an open-source implementation closely compatible with S. R inherited S’s syntax, its functional and object-oriented ideas, and much of its approach to data and graphics, while adding its own innovations and an enormous package ecosystem (CRAN).
Today, working in pure S is rare—the language has effectively been superseded by R, and the commercial S-PLUS line has wound down. Where you encounter “S” in modern practice, it is usually as the conceptual ancestor whose ideas live on in R code, in the S3 and S4 object systems that R adopted directly, and in the model-formula syntax that statisticians use every day.
Why It Matters
S occupies a pivotal place in the history of programming languages for data. It demonstrated that an interactive, high-level, extensible language could become the natural medium for statistical thinking, blurring the line between using software and programming it. Its influence is visible not only in R—now one of the most popular languages for statistics and data science—but in the broader culture of reproducible, code-driven data analysis. John Chambers’s receipt of the 1998 ACM Software System Award recognized S as having “forever altered the way people analyze, visualize, and manipulate data,” a citation that captures just how far the ideas first sketched at Bell Labs in 1976 have traveled.
Timeline
Notable Uses & Legacy
Bell Laboratories
Birthplace of S, where it served as the in-house environment for statistics research, exploratory data analysis, and graphics throughout the 1980s and 1990s
S-PLUS (TIBCO / Insightful)
Commercial implementation of S used in pharmaceutical, financial, and government analytics for statistical modeling and reproducible data analysis
Academic statistics departments
Adopted widely as a teaching and research environment for applied statistics, with S code used throughout textbooks and journal articles
Pharmaceutical and clinical research
Used via S-PLUS for clinical trial analysis and regulatory submissions before R became the dominant open-source successor
R programming language
R was built as a free, largely S-compatible implementation, carrying S's syntax and ideas to a global open-source community