Variables and Types in R
Learn how R handles variables, atomic types, type coercion, and special values — with its vector-first philosophy and dynamic, strong typing
R is dynamically typed and strongly typed: you never declare a variable’s type, but R will refuse to silently mix incompatible types without an explicit conversion. What makes R unusual compared to most general-purpose languages is its vector-first design. There are no true scalars in R — a number like 30 is stored as a numeric vector of length 1. This decision shapes how assignment, type checking, and coercion behave, and it’s the single most important idea to internalize when learning R.
R’s typing system is also pragmatic for data analysis. The default numeric type is a double-precision floating-point number (not an integer), missing values are represented by a dedicated NA sentinel rather than an error, and entire vectors can be converted between types in a single call. In this tutorial you’ll learn R’s assignment operators, the atomic types you’ll use every day, how coercion works between them, and the handful of special values that make R feel distinctly statistical.
Assignment and Basic Types
R offers several assignment operators, but <- is by far the most idiomatic. It reads as “gets” — age gets 30 — and is preferred by nearly every R style guide. The = operator also works at the top level and is often used inside function call argument lists. A rightward form -> exists but is rarely seen in practice.
R’s atomic types fall into a short list: numeric (double-precision floating point), integer (requires the L suffix for literals), character (strings), logical (TRUE/FALSE), plus complex and raw which are less common. Notice below that a plain number like 30 is classed as "numeric", not "integer" — this is often surprising to newcomers.
Create a file named variables.R:
| |
Run it with:
| |
Expected Output
[1] 30
[1] 5.9
[1] "Ada"
[1] TRUE
[1] "numeric"
[1] "numeric"
[1] "character"
[1] "logical"
[1] 100
[1] "integer"
[1] 3.14159
The [1] prefix shows that each “value” is actually a length-1 vector — R is reminding you that 30 is really the first element of a numeric vector. Character strings print with quotes, logicals print unquoted, and integers print without a decimal point. The class() function returns R’s user-facing type name.
Type Coercion and Conversion
Because R is strongly typed, it will not silently let you add a number to a string. But R offers explicit conversion functions (as.numeric, as.integer, as.character, as.logical) and also applies automatic coercion rules inside vectors and arithmetic expressions. The coercion hierarchy is: logical → integer → numeric → character. When values of different types appear together, R promotes everything to the most general type in this chain.
A particularly useful quirk: logicals are fully usable in arithmetic, with TRUE becoming 1 and FALSE becoming 0. This makes sum(x > 0) an idiomatic way to count matching elements in a vector — the comparison produces logicals, and sum() adds them as numbers.
Create a file named type_conversion.R:
| |
Run it with:
| |
Expected Output
[1] 42
[1] "numeric"
[1] "30"
[1] "character"
[1] 2
[1] 1 2 1 0
[1] "numeric"
[1] NA
Notice how c(1, 2, TRUE, FALSE) produced the numeric vector 1 2 1 0: the logicals were promoted to numbers because numeric is higher in the coercion hierarchy than logical. The final NA shows R’s error-tolerant approach — an invalid conversion issues a warning and returns NA rather than halting the program, which matters when cleaning up messy real-world data.
Special Values and Vectors
R has four distinct “missing” or “unusual” values, and understanding them is essential for data analysis. NA represents a missing value of some type — when you import a CSV and a cell is empty, it becomes NA. NULL represents the absence of a value altogether (different from “missing”). NaN is “not a number”, the result of operations like 0/0. Inf is positive infinity, the result of 1/0.
Finally, the vector is R’s fundamental data structure. You create one with the c() function (for “combine”). Because R has no true scalars, every value you’ve seen so far has really been a vector of length 1 — that’s what the [1] prefix in printed output has been hinting at.
Create a file named special_values.R:
| |
Run it with:
| |
Expected Output
[1] TRUE
[1] TRUE
[1] TRUE
[1] TRUE
[1] NA
[1] NA
[1] 1
[1] 1 2 3 4 5
[1] 5
[1] "numeric"
[1] 2 4 6 8 10
[1] "apple" "banana" "cherry"
[1] 3
Two behaviors stand out. First, NA == NA is NA, not TRUE — because the comparison between two unknown values cannot itself be known. Always use is.na() to check for missing values, never == NA. Second, numbers * 2 works without writing a loop: R broadcasts the scalar 2 over every element of the vector. This element-wise, vectorized behavior is central to how idiomatic R code is written.
Key Concepts
- Idiomatic assignment is
<-, not=. Both work at the top level, but R style guides and the community overwhelmingly prefer<-. The=form is usually reserved for specifying arguments inside function calls. - There are no scalars — everything is a vector. A “single” value like
30is really a numeric vector of length 1, which is why printed output starts with[1]. This underlies R’s vectorized arithmetic. - Numeric literals default to double-precision, not integer. Use the
Lsuffix (100L) when you specifically need an integer type. Without it,class(100)is"numeric". - R is dynamically typed but strongly typed. You don’t declare types, but you can’t mix incompatible types without explicit conversion. Conversion uses the
as.<type>()family of functions. - Automatic coercion follows a hierarchy:
logical→integer→numeric→character. When values of different types meet, everything is promoted to the most general type. - Four special values serve distinct purposes:
NA(missing),NULL(absent),NaN(undefined numeric result), andInf(infinity). Each has a matchingis.*()predicate, andNApropagates through arithmetic and comparisons. - Variable names can contain letters, digits, dots, and underscores, but must start with a letter or a dot. Names are case-sensitive (
Ageandageare different). Using.in names is common in base R; using_is common in tidyverse code. - Logicals are full-fledged numbers.
TRUE + TRUEis2. This makes counting withsum()over a logical vector idiomatic and fast.
Comments
Loading comments...
Leave a Comment