Beginner

Variables and Types in R

Learn how R handles variables, atomic types, type coercion, and special values — with its vector-first philosophy and dynamic, strong typing

R is dynamically typed and strongly typed: you never declare a variable’s type, but R will refuse to silently mix incompatible types without an explicit conversion. What makes R unusual compared to most general-purpose languages is its vector-first design. There are no true scalars in R — a number like 30 is stored as a numeric vector of length 1. This decision shapes how assignment, type checking, and coercion behave, and it’s the single most important idea to internalize when learning R.

R’s typing system is also pragmatic for data analysis. The default numeric type is a double-precision floating-point number (not an integer), missing values are represented by a dedicated NA sentinel rather than an error, and entire vectors can be converted between types in a single call. In this tutorial you’ll learn R’s assignment operators, the atomic types you’ll use every day, how coercion works between them, and the handful of special values that make R feel distinctly statistical.

Assignment and Basic Types

R offers several assignment operators, but <- is by far the most idiomatic. It reads as “gets” — age gets 30 — and is preferred by nearly every R style guide. The = operator also works at the top level and is often used inside function call argument lists. A rightward form -> exists but is rarely seen in practice.

R’s atomic types fall into a short list: numeric (double-precision floating point), integer (requires the L suffix for literals), character (strings), logical (TRUE/FALSE), plus complex and raw which are less common. Notice below that a plain number like 30 is classed as "numeric", not "integer" — this is often surprising to newcomers.

Create a file named variables.R:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# Variables and Types in R

# The <- operator is idiomatic R assignment
age <- 30
height <- 5.9
name <- "Ada"
is_student <- TRUE

# Print each variable
print(age)
print(height)
print(name)
print(is_student)

# Check the class (user-facing type) of each
print(class(age))
print(class(height))
print(class(name))
print(class(is_student))

# Integer requires the L suffix; otherwise R stores as double
count <- 100L
print(count)
print(class(count))

# = also works, though <- is strongly preferred in R style
pi_value = 3.14159
print(pi_value)

Run it with:

1
2
3
4
5
# Pull the R image (only needed once)
docker pull r-base:4.4.2

# Run the script
docker run --rm -v $(pwd):/app -w /app r-base:4.4.2 Rscript variables.R

Expected Output

[1] 30
[1] 5.9
[1] "Ada"
[1] TRUE
[1] "numeric"
[1] "numeric"
[1] "character"
[1] "logical"
[1] 100
[1] "integer"
[1] 3.14159

The [1] prefix shows that each “value” is actually a length-1 vector — R is reminding you that 30 is really the first element of a numeric vector. Character strings print with quotes, logicals print unquoted, and integers print without a decimal point. The class() function returns R’s user-facing type name.

Type Coercion and Conversion

Because R is strongly typed, it will not silently let you add a number to a string. But R offers explicit conversion functions (as.numeric, as.integer, as.character, as.logical) and also applies automatic coercion rules inside vectors and arithmetic expressions. The coercion hierarchy is: logicalintegernumericcharacter. When values of different types appear together, R promotes everything to the most general type in this chain.

A particularly useful quirk: logicals are fully usable in arithmetic, with TRUE becoming 1 and FALSE becoming 0. This makes sum(x > 0) an idiomatic way to count matching elements in a vector — the comparison produces logicals, and sum() adds them as numbers.

Create a file named type_conversion.R:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# Explicit conversion: string to number
num_str <- "42"
converted <- as.numeric(num_str)
print(converted)
print(class(converted))

# Explicit conversion: number to string
age <- 30
age_str <- as.character(age)
print(age_str)
print(class(age_str))

# Logical coerces to numeric in arithmetic (TRUE = 1, FALSE = 0)
total <- TRUE + TRUE + FALSE
print(total)

# Automatic coercion inside c(): logical -> numeric
mixed_num <- c(1, 2, TRUE, FALSE)
print(mixed_num)
print(class(mixed_num))

# Strong typing: an invalid conversion yields NA with a warning, not an error
bad <- suppressWarnings(as.numeric("not a number"))
print(bad)

Run it with:

1
docker run --rm -v $(pwd):/app -w /app r-base:4.4.2 Rscript type_conversion.R

Expected Output

[1] 42
[1] "numeric"
[1] "30"
[1] "character"
[1] 2
[1] 1 2 1 0
[1] "numeric"
[1] NA

Notice how c(1, 2, TRUE, FALSE) produced the numeric vector 1 2 1 0: the logicals were promoted to numbers because numeric is higher in the coercion hierarchy than logical. The final NA shows R’s error-tolerant approach — an invalid conversion issues a warning and returns NA rather than halting the program, which matters when cleaning up messy real-world data.

Special Values and Vectors

R has four distinct “missing” or “unusual” values, and understanding them is essential for data analysis. NA represents a missing value of some type — when you import a CSV and a cell is empty, it becomes NA. NULL represents the absence of a value altogether (different from “missing”). NaN is “not a number”, the result of operations like 0/0. Inf is positive infinity, the result of 1/0.

Finally, the vector is R’s fundamental data structure. You create one with the c() function (for “combine”). Because R has no true scalars, every value you’ve seen so far has really been a vector of length 1 — that’s what the [1] prefix in printed output has been hinting at.

Create a file named special_values.R:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# The four special values
missing_value <- NA
empty_value <- NULL
not_a_number <- NaN
infinity <- Inf

# Each has a dedicated predicate for testing
print(is.na(missing_value))
print(is.null(empty_value))
print(is.nan(not_a_number))
print(is.infinite(infinity))

# NA propagates: anything combined with NA is usually NA
print(NA + 1)
print(NA == NA)

# In R, single values are length-1 vectors
age <- 30
print(length(age))

# Create a proper vector with c()
numbers <- c(1, 2, 3, 4, 5)
print(numbers)
print(length(numbers))
print(class(numbers))

# Vectors work element-wise in arithmetic
doubled <- numbers * 2
print(doubled)

# Character vectors work the same way
fruits <- c("apple", "banana", "cherry")
print(fruits)
print(length(fruits))

Run it with:

1
docker run --rm -v $(pwd):/app -w /app r-base:4.4.2 Rscript special_values.R

Expected Output

[1] TRUE
[1] TRUE
[1] TRUE
[1] TRUE
[1] NA
[1] NA
[1] 1
[1] 1 2 3 4 5
[1] 5
[1] "numeric"
[1]  2  4  6  8 10
[1] "apple"  "banana" "cherry"
[1] 3

Two behaviors stand out. First, NA == NA is NA, not TRUE — because the comparison between two unknown values cannot itself be known. Always use is.na() to check for missing values, never == NA. Second, numbers * 2 works without writing a loop: R broadcasts the scalar 2 over every element of the vector. This element-wise, vectorized behavior is central to how idiomatic R code is written.

Key Concepts

  • Idiomatic assignment is <-, not =. Both work at the top level, but R style guides and the community overwhelmingly prefer <-. The = form is usually reserved for specifying arguments inside function calls.
  • There are no scalars — everything is a vector. A “single” value like 30 is really a numeric vector of length 1, which is why printed output starts with [1]. This underlies R’s vectorized arithmetic.
  • Numeric literals default to double-precision, not integer. Use the L suffix (100L) when you specifically need an integer type. Without it, class(100) is "numeric".
  • R is dynamically typed but strongly typed. You don’t declare types, but you can’t mix incompatible types without explicit conversion. Conversion uses the as.<type>() family of functions.
  • Automatic coercion follows a hierarchy: logicalintegernumericcharacter. When values of different types meet, everything is promoted to the most general type.
  • Four special values serve distinct purposes: NA (missing), NULL (absent), NaN (undefined numeric result), and Inf (infinity). Each has a matching is.*() predicate, and NA propagates through arithmetic and comparisons.
  • Variable names can contain letters, digits, dots, and underscores, but must start with a letter or a dot. Names are case-sensitive (Age and age are different). Using . in names is common in base R; using _ is common in tidyverse code.
  • Logicals are full-fledged numbers. TRUE + TRUE is 2. This makes counting with sum() over a logical vector idiomatic and fast.

Running Today

All examples can be run using Docker:

docker pull r-base:4.4.2
Last updated:

Comments

Loading comments...

Leave a Comment

2000 characters remaining