SAS
A procedural, fourth-generation language and analytics environment for data management and statistical analysis, built around the DATA step and an extensive library of procedures
Created by Anthony Barr, James Goodnight, John Sall, and Jane Helwig (SAS Institute)
SAS — originally an acronym for Statistical Analysis System — is a procedural, fourth-generation programming language and a broad commercial software environment for data management, advanced analytics, and reporting. More than a single language, SAS is an integrated platform: a programming language wrapped around a vast library of pre-built statistical and data-processing procedures, all designed so that an analyst can read raw data, transform it, run rigorous statistical methods on it, and produce publication-ready output without leaving the system. For decades it has been the backbone of analytics in regulated industries where correctness, validation, and reproducibility matter more than novelty.
History & Origins
SAS grew out of an academic effort at North Carolina State University. Beginning around 1966, Anthony Barr was hired to write software for analysis of variance and regression on IBM System/360 mainframes, work funded by a consortium of agricultural experiment stations across the southern United States that needed a common tool for analyzing crop and agricultural data. Barr designed the fundamental structure and language; James Goodnight joined in 1968 and contributed the statistical engine, including general linear modeling.
Early releases circulated under year-based names — a limited SAS 71, followed by SAS 72, the first broadly distributed version, which introduced staples like the MERGE statement and missing-data handling. As demand grew beyond the original consortium, the founders — Barr, Goodnight, John Sall, and Jane Helwig, who wrote the first documentation — spun the project out of the university. SAS Institute Inc. was incorporated in 1976, the year most commonly cited as SAS’s debut as a commercial product. The company settled in North Carolina’s Research Triangle, eventually in Cary, where it remains headquartered today.
From there SAS grew into one of the most successful privately held software companies in the world. Goodnight, still the company’s CEO, and Sall together own the firm, which has famously remained private rather than going public.
Design Philosophy
SAS is built on a deliberately pragmatic, analyst-centered philosophy: give domain experts — statisticians, epidemiologists, actuaries — a language in which data preparation, analysis, and reporting are first-class and tightly integrated, and ship batteries-included procedures so that users call validated methods rather than reimplementing algorithms.
This philosophy shows up in the two-part rhythm of nearly all SAS programs:
- The DATA step is SAS’s data-engineering workhorse. It reads, transforms, joins, and reshapes data, executing an implicit loop that processes one observation (row) at a time through an in-memory buffer known as the Program Data Vector. The DATA step blends executable logic with declarative statements and gives the programmer fine-grained control over how each record is built.
- The PROC step invokes one of hundreds of pre-built procedures —
PROC MEANS,PROC FREQ,PROC REG,PROC GLM,PROC SQL, and many more. Rather than writing a regression from scratch, the analyst calls the procedure and supplies options.
Because the procedures are tested, documented, and stable across decades, organizations in regulated fields can trust and audit their results — a major reason SAS became entrenched where validation is mandatory.
Key Features
| Feature | What it provides |
|---|---|
| DATA step | Row-by-row data manipulation via the implicit-loop Program Data Vector model |
| PROC library | Hundreds of validated statistical, data-management, and reporting procedures |
| SAS macro language | Text-substitution metaprogramming (%MACRO, & macro variables) for code generation and reuse |
| ODS (Output Delivery System) | Renders output to HTML, PDF, RTF, Excel, and more |
| PROC SQL | Embedded SQL for relational querying inside SAS |
| Backward compatibility | Decades-old SAS programs typically still run on current releases |
SAS’s type system is intentionally minimal: variables are either numeric or character, with no rich type hierarchy. This simplicity, combined with the implicit data-flow model of the DATA step, makes SAS approachable for analysts who are not professional programmers while still scaling to very large datasets.
A Taste of the Language
A small program reads some data and summarizes it, illustrating the DATA-step / PROC-step pairing:
| |
The data step builds a dataset named sales one observation at a time; the proc means step then computes summary statistics over it. The $ marks region as a character variable — one of SAS’s only two fundamental types.
Evolution
SAS evolved from year-named mainframe releases (SAS 71, 72, 76, 79, 82) into a sequentially versioned, multi-platform product. Version 6, beginning in the mid-1980s, ushered in a long, portable era in which the macro language matured and SAS was ported across many operating environments. SAS 8 (1999) introduced the Output Delivery System and the Enterprise Guide GUI, broadening SAS beyond pure programming. The SAS 9 platform (launched 2002, with rollout through 2004) added a metadata-server architecture and multithreaded procedures, and SAS 9.4 (2013) became the durable foundation of the modern 9.x line, still receiving maintenance releases — the latest being 9.4M9 in 2025.
The most significant recent shift is SAS Viya, launched in 2016: a cloud-native, container- and Kubernetes-based platform built for distributed, in-memory analytics and designed to interoperate with open-source languages. Viya represents SAS’s strategic answer to the rise of Python and R, letting those languages drive SAS’s analytics engine while retaining the governance and support enterprises expect.
Current Relevance
SAS Institute remains privately held and is among the largest privately owned software companies in the world, with annual revenue measured in the billions and a global workforce in the tens of thousands. Its position in the market is a study in contrasts. In academia and the broader data-science community, open-source R and Python have steadily displaced SAS, helped by zero licensing cost and enormous library ecosystems. Yet in regulated enterprise settings — pharmaceutical FDA submissions, banking risk and compliance, government statistics, insurance — SAS retains a deep moat, because the cost of switching away from validated, audited, decades-stable analytics pipelines is enormous.
SAS’s modernization strategy centers on Viya and on embracing rather than resisting open source: analysts can write Python or R against SAS’s compute engine, blending familiar tooling with enterprise-grade scalability and governance.
Why It Matters
SAS demonstrated, earlier and more thoroughly than almost any other system, that statistical computing could be packaged as a dependable industrial product. By integrating data wrangling, analysis, and reporting behind a consistent language and a library of validated procedures, it made sophisticated analytics accessible to domain experts and trustworthy enough for life-or-death decisions in medicine and high-stakes decisions in finance. Its DATA-step model, its macro system, and its procedure-driven workflow shaped how a generation of analysts thought about working with data. Even as open-source tools reshape the field, SAS’s influence — and its entrenched presence in the world’s most regulated industries — endures.
Timeline
Notable Uses & Legacy
Pharmaceutical & clinical trials
SAS is the de facto standard for analyzing clinical-trial data and preparing regulatory submissions to the U.S. FDA, where validated, reproducible procedures and CDISC data standards are central to the approval process
Banking & financial services
Banks use SAS for credit scoring, risk modeling, regulatory compliance reporting, and fraud detection across large transactional datasets
Government & public sector
Statistical agencies, tax authorities, and census organizations use SAS for large-scale data processing, official statistics, and fraud or tax-evasion detection
Healthcare & life sciences
Health outcomes research, epidemiology, and payer/provider analytics rely on SAS for managing and analyzing complex, sensitive datasets
Insurance
Actuaries and insurers use SAS for pricing models, reserving, and predictive risk analytics
Academia & institutional research
Universities have long used SAS for statistics instruction and institutional research, though open-source tools have eroded its academic dominance in recent years