Arrays, Pointers, Memory, and Vectors#
Overview
Questions:
What are pointers and references in C++
Why/when should I use a pointer or reference?
Pitfalls of using pointer
Objectives:
To learn about pointers
To learn about different ways of representing lists/arrays of data
Prerequisites
Understanding of basic C++ syntax
Able to compile and run simple C++ programs
Arrays#
C++ has support for “old-fashioned”, C-style arrays. In this case, the sizes of these arrays must be known at compile time, and therefore are a constant size. A good example of this would be a 3d point, which would be 3 double-precision floating point numbers.
Arrays in C++, like python, are zero-indexed.
Since you know the size (at compile time), looping over all the elements of the
array can be done with a for
loop; from the loop, you fill or access the array.
#include <iostream> // for std::cout, std::endl
int main(void)
{
// Create an array of 10 integers
int arr[10];
// Fill the array
for(int i = 0; i < 10; i++)
{
// Set the i-th value to 2*i
arr[i] = 2*i;
}
// Print out the elements of the array (with the index)
for(int i = 0; i < 10; i++)
{
std::cout << "Element " << i << ": " << arr[i] << std::endl;
}
return 0;
}
Element 0: 0
Element 1: 2
Element 2: 4
Element 3: 6
Element 4: 8
Element 5: 10
Element 6: 12
Element 7: 14
Element 8: 16
Element 9: 18
C and C++ provide additional syntax for initializing the array - just assign
the array to the values you want to store in curly braces. In this case,
the size of the array can be omitted (int arr[] = ...
); however I generally
find it good practice to keep it as it will allow the compiler to double-check
the given size with how many elements were given in the initialization list.
#include <iostream> // for std::cout, std::endl
int main(void)
{
int arr[10] = { 3, 5, 7, 9, 11, 13, 15, 17, 19, 21 };
for(int i = 0; i < 10; i++)
{
std::cout << "Element " << i << ": " << arr[i] << std::endl;
}
return 0;
}
Element 0: 3
Element 1: 5
Element 2: 7
Element 3: 9
Element 4: 11
Element 5: 13
Element 6: 15
Element 7: 17
Element 8: 19
Element 9: 21
Exercise
What happens if you specify more than 10 elements in the braces? What happens if you specify fewer? Try compiling it and see
solution
Specifying more will result in a compiler error. Specifying too few will only initialize the elements given.
Overflow
What happens when you try to write beyond the bounds of the array? That is called a buffer overflow and depending on many factors can result in a crash or random behavior. Attempting to read out of bounds may also result in a crash, or you may just read garbage.
Addresses#
A word of Warning
Pointer mismanagement is a common source of bugs in C and often in C++. These bugs often lead to crashes (if you are lucky) but can also lead to silent, hard-to-debug issues or security vulnerabilites.
In C/C++, variables you declare have a defined amount of memory and exist at a particular location in memory. When accessing this variable, the compiler generates (binary) code to lookup the data in that location.
We can get the location of a variable using the &
operator. It is rarely
needed to print this information, but can be used in debugging tricky bugs.
Create a new file and try out the following code.
#include <iostream> // for std::cout, std::endl
int main(void)
{
int j = 1234;
std::cout << "Address of j: " << &j << std::endl;
return 0;
}
When running, I get the following output. The address will likely be different on your computer, and will change when re-running.
Address of j: 0x7ffd0046f874
The address of the pointer is given in hexadecimal, but its exact value is very rarely important.
Pointers#
Now that we know how to get the address of a variable, we can then store that address in another variable. Doing so gives us a pointer to the same memory address (starting byte) as the original variable.
To do so, we must define the type of the new variable to be a pointer
type. This is denoted with a star (*
) that comes after the pointed-to type.
#include <iostream> // for std::cout, std::endl
int main(void)
{
int j = 1234;
std::cout << "Address of j: " << &j << std::endl;
std::cout << "Value of j: " << j << std::endl;
int * pj = &j; // pj points to address of j
std::cout << "Value of pj: " << pj << std::endl;
return 0;
}
Address of j: 0x7fff9eebe4ec
Value of j: 1234
Value of pj: 0x7fff9eebe4ec
You will notice that the value of the pointer pj
is the address that
we assigned to it. So how can we retrieve the actual value stored in the
pointed-to memory? By again using star (in this case, called the pointer
dereferencing operator)
#include <iostream> // for std::cout, std::endl
int main(void)
{
int j = 1234;
std::cout << "Address of j: " << &j << std::endl;
std::cout << "Value of j: " << j << std::endl;
int * pj = &j; // pi points to address of j
std::cout << "Value of pj: " << pj << std::endl;
std::cout << "Value of *pj: " << *pj << std::endl;
return 0;
}
Address of j: 0x7ffd8e66f70c
Value of j: 1234
Value of pj: 0x7ffd8e66f70c
Value of *pj: 1234
So now we’ve shown that we can access the same memory location, but can we change the value?
Exercise
Change the main function to modify the value of j through pointer pj.
Solution
We can assign data directly to *pj
#include <iostream> // for std::cout, std::endl
int main(void)
{
int j = 1234;
std::cout << "Value of j: " << j << std::endl;
int * pj = &j; // pj points to address of j
std::cout << "Value of *pj: " << *pj << std::endl;
// Change j via pj
*pj = 5678;
std::cout << "Value changed!" << std::endl;
std::cout << "New Value of j: " << j << std::endl;
std::cout << "New Value of *pj: " << *pj << std::endl;
return 0;
}
Value of j: 1234
Value of *pj: 1234
Value changed!
New Value of j: 5678
New Value of *pj: 5678
Exercise
What is the size of a pointer (that is, how many bytes)? Does it change with the type that is being pointed to? What does that mean about the amount of memory that you can access from your program?
Solution
Pointers are almost always 8 bytes (64-bits) nowadays, so they can access ~2^64 bytes of memory, since a pointer points to a byte of memory. Before, with 32-bit machines, you could only access ~2^32 = 4 GB of memory.
Null Pointers#
All pointers can be set to a special value - nullptr
. In pre-c++11
(and C), NULL
can also be used. This means that the pointer points to
nothing. Attempting to dereference a null pointer should result in a runtime
error. It is often used as a signal that the pointer was not set to anything.
If you are not setting a pointer to point to existing data, or to memory
allocated via new
(see below), it is good practice to set it to nullptr
instead. If you don’t, the pointer might point to a random area of memory.
Manual addressing of pointers
Can you manually set the address of a pointer via something like int * j = 0x13579bdf;
? Yes.
However, in scientific computing, this would be extremely rarely done, if ever.
It does have its uses with embedded platforms, microcontrollers, etc. You will likely
never need to do this in your entire scientific career.
Dynamic memory allocation#
Think about the array examples. We must know how many elements exist at compile time. But what if we don’t?
For example, lets say we want to read in the coordinates of a molecule. How big should that array be? One solution is to make a very large array and then only use part of it. This not only results in inefficient use of memory, but also can cause many headaches down the road when people try to run using larger molecules.
The solution is to use dynamic memory allocation. In C++, this is done with
the new
keyword, which returns (you guessed it) a pointer! This points to
the newly-allocated memory that is the number of elements you requested.
If you allocate memory with new
, you must also deallocate it with
delete
. Doing this tells the operating system that you are done using it
and allows it to be used by other programs.
If you fail to deallocate/free the memory, you will have a memory leak. When your program ends, the operating system will still be able to free any memory that you did not. However, if your program is long-lived (ie, a very long simulation), memory usage can steadily increase and maybe even trigger an OOM (out-of-memory) error in the operating system, which will kill your process (best-case scenario) or kill someone elses (worst-case).
#include <iostream> // for std::cout, std::endl
int main(void)
{
// Would come from somewhere like a function argument
int n_doubles = 16;
// Store the result of new in a pointer
// Note that the type of the pointer and the type
// passed to 'new' must match
double * data = new double[n_doubles];
// Loop over it like before
for(int i = 0; i < n_doubles; i++)
{
data[i] = 3.1415 * i;
}
for(int i = 0; i < n_doubles; i++)
{
std::cout << "Element " << i << ": " << data[i] << std::endl;
}
// When done, free it
delete [] data;
return 0;
}
Note the []
given to the delete operator. That tells delete to expect to be deleting
an array. You can allocate single objects (ie, double * data = new double;
) in which
case you do not need the []
. This is less common in scientific computing, but still
occassionally needed.
A better way - std::vector#
Dynamic-sized, resizable arrays of data are commonly needed. Python has a
list
which fits this idea, for example. If you need a dynamically-sized
array of data that behaves like a list
in Python, then you can use
std::vector
in C++. A std::vector
is an array that can be added to and
resized just like a list
, while the memory management shown above is all
handled automatically. In general, I recommend using std::vector
rather
than pointers and new
, as it is much easier to deal with and comes with
nice features, while the overhead in almost all cases is negligible.
The big difference between C++ vector
and Python list
is that a C++
vector
is homogeneous - that is, it can only hold data of a single
specified type. For example, you cannot add a string to a vector
of integers. That example is possible in python, where list
s are
heterogeneous.
Another difference is that C++ does not support the slicing syntax that Python has (with :
).
To use std::vector
, we must add #include <vector>
at the top of our file.
To declare a std::vector
, we need to put the type in angled
brackets. For example, a std::vector
of double
would be declared as
std::vector<double>
.
Then, functions can be used to modify the vector, similar to lists in python
|
Python |
Description |
---|---|---|
|
(none) |
Access an element at an index (no bounds check) |
|
|
Access an element at an index (with bounds checking) |
|
|
Add an element to the end |
|
|
Remove & Return the last element |
|
|
Get the number of elements in the list/vector |
Consider the examples in the previous section. We can rewrite that to use a
std::vector
with the following change
#include <iostream> // for std::cout, std::endl
#include <vector> // for std::vector
int main(void)
{
int memsize = 16;
// A dynamic array of double
std::vector<double> data;
for(int i = 0; i < memsize; i++)
{
data.push_back(3.1415 * i);
}
for(int i = 0; i < data.size(); i++)
{
std::cout << "Element " << i << ": " << data[i] << std::endl;
}
// Memory is deleted automatically! No need to delete/deallocate
return 0;
}
std::vector
also supports construction with a list of objects like the C-style array
does (std::vector<double> data = { 1.0, 2.0, 3.0 };
)
Safely accessing elements of a vector#
An std::vector
has an .at()
function which takes an index. This is interchangeable with
using square brackets []
, except the .at()
function will cause a runtime error if the index
is beyond the bounds of the vector.
#include <iostream> // for std::cout, std::endl
#include <vector> // for std::vector
int main(void)
{
int memsize = 16;
// A dynamic array of double
std::vector<double> data;
for(int i = 0; i < memsize; i++)
{
data.push_back(3.1415 * i);
}
std::cout << "Element 100: " << data.at(100) << std::endl;
return 0;
}
terminate called after throwing an instance of 'std::out_of_range'
what(): vector::_M_range_check: __n (which is 100) >= this->size() (which is 16)
Aborted (core dumped)
std::array#
The C++ standard library also has a replacement for C-style, fixed-sized arrays (like int arr[5]
).
It is called std::array
and takes both the type and constant size between the angled brackets
(as opposed to just specifying the type in std::vector
).
This kind of object is useful, for example, for 3d points (x,y,z for a molecule).
#include <iostream>
#include <array>
int main(void)
{
std::array<int, 5> c_arr1 = {1, 2, 3, 4, 5};
std::array<int, 5> c_arr2 = {6, 7, 8, 9, 10};
// OK with C++ Standard library
c_arr2 = c_arr1;
for(int i = 0; i < 5; i++)
{
std::cout << "c_arr1[" << i << "] = " << c_arr1[i] << std::endl;
std::cout << "c_arr2[" << i << "] = " << c_arr2[i] << std::endl;
}
return 0;
}
What are the benefits of using std::array
over C-style arrays? std::array
contains functions like .at
, which will do bounds checking. But perhaps the
most important aspect is that copying std::array
(for example, when passing to
functions) is much more clearer than C-style arrays, which can be very tricky.
int main(void)
{
int c_arr1[5] = {1, 2, 3, 4, 5};
int c_arr2[5] = {6, 7, 8, 9,10};
// ERROR!
c_arr2 = c_arr1;
}
In general, std::array
should be preferred over C-style arrays.
Exercise
How would you specify an std::vector
of 3D points?
Solution
std::vector<std::array<double, 3>>
Typedefs#
The solution for the previous exercise shows that types can become cumbersome
in C++. Fortunately, C++ defines a way to make types a little more manageable
by allowing the programmer to give them a more descriptive name. This
is done by the typedef
keyword.
typedef std::array<double, 3> AtomCoord;
typedef std::vector<AtomCoord> Coordinates;
The types AtomCoord
and Coordinates
can now be used.
AtomCoord coord1 = {1.0, 2.0, 3.0};
Coordinates coords;
coords.push_back(coord1);