More Things about Strings

[Note: This writeup is not relevant for Project 2.]

A string is a sequence of zero or more characters. The size function tells you how many characters are in the string:

        string s = "Hello";
        cout << s.size();   // writes 5
        s = "Wow";
        cout << s.size();   // writes 3
        s = "";
        cout << s.size();   // writes 0

(For historical reasons, there is also a length function that returns the same value that size does. In other words, s.length() and s.size() may be used interchangeably.)

You can access individual characters in a string using the at function. The positions of the characters in a string are numbered from left to right, starting at 0. Your program will die with a runtime error if it tries to access a character at a position that is out of range for the string.

(You can also access individual characters in a string using the [] operator. Your program's behavior is undefined if it tries to access a character at a position that is out of range for the string.)

                                 //  01234
        string s = "Hello";      //  Hello
        cout << s.at(0);   // writes H
        cout << s.at(4);   // writes o
        cout << s.at(6);   // Runtime error!
        cout << s.at(-1);  // Runtime error!

To visit every character in a string (for example, to write each character of the string on a line by itself), you can say

        string s = "Hello";
        for (int k = 0; k != s.size(); k++)
            cout << s.at(k) << endl;

[You don't have to read the nerdy footnote at the bottom of this page that has something to say about the loop above.]

Another thing you can do with a string is to append characters to the end of the string. The += operator lets you do this. (This is a different use of the operator than the one that lets you add a number to an int or double variable.) Here's an example where we copy all of the non-blank characters from the string s to the string t:

        string s = "Hello there!  How are you?";
        string t;   // automatically initialized to the empty string
        for (size_t k = 0; k != s.size(); k++)
        {
            if (s.at(k) != ' ')   // If s.at(k) is not a blank
                t += s.at(k);     //   append s.at(k) to t
        }
        cout << t;  //  writes Hellothere!Howareyou?

Notice that when talking about constants representing single characters, we use single quote marks, not double quote marks. C++ distinguishes between the type string, objects of which are sequences of zero or more characters, and the type char, objects of which are always a single character. If s is a string, then the expression s.at(k) is a char. The language lets us compare a char with another char, like the constants ' ' or '@' or 'A'. (The single quotes denote a char constant.)

You are also able to copy a substring of a string. For example, here's how we can copy the substring of s starting at position 5 and going for 3 characters:

                                  // 012345678
        string s = "duplicate";   // duplicate
        cout << s.substr(5,3);    // writes cat 

Here's how to clip off the first six characters of a string:

        string t = "fingernail";
        t = t.substr(6, t.size()-6);  // t is now "nail"

Sometimes we want to classify characters, asking, for example, whether they are letter characters or digit characters. If you say

        #include <cctype>

then you can use character classification functions like these:

                                 //  012345678
        string s = "30 For 30";  //  30 For 30
        if (isdigit(s.at(0)))    // tests as true, since '3' is a digit character
          ...
        if (isalpha(s.at(3)))    // tests as true, since 'F' is a letter
          ...
        if (isupper(s.at(3)))    // tests as true, since 'F' is an uppercase letter
          ...
        if (islower(s.at(5)))    // tests as true, since 'r' is a lowercase letter
          ...
        if (islower(s.at(3)))    // tests as false, since 'F' is not a lowercase letter
          ...
        if (isalpha(s.at(2)))    // tests as false, since ' ' is not a letter
          ...
        if (isalpha(s.at(0)))    // tests as false, since '3' is not a letter
          ...

This code copies all non-letters in a string:

        string s = "#1 in 2025: Yeah!";
        string t;
        for (size_t k = 0; k != s.size(); k++)
            if (!isalpha(s.at(k)))  // if not a letter
                t += .at(k);       //   append it to t

        // t is now "#1  2025: !"

Caution: For historical reasons, isalpha, isdigit, etc., return an int, not a bool. If the condition they test for is met, they return a non-zero value (which tests as true), but that value might be a non-zero value other than 1. So to test if the condition is met, write your test as, say,

        if (isalpha(ch))

instead of

        if (isalpha(ch) == true)  // WRONG!!!!

since in a comparison involving an int and a bool, the bool will be converted to int; because true converts to 1, and the non-zero int that isalpha returns for a letter might not be 1, the condition for the if might evaluate to false.

The function tolower, when given an uppercase letter, returns the lowercase equivalent of that letter; when given any other character, just returns that same character. So

        string s = "Don't SHOUT!";
        string t;
        for (size_t k = 0; k != s.size(); k++)
            t += tolower(s.at(k));
        cout << t;

writes don't shout!. Similarly, the function toupper returns the uppercase equivalent of a letter.

There's a lot more you can do with strings and characters, but the information in this tutorial will suffice to enable you to do Project 3.

Nerdy footnote

While a loop starting

        for (int k = 0; k != s.size(); k++)

will work for everything you're doing in this class, technically the expression s.size() returns a number of a special type defined in the library: not int, but string::size_type. This type name is a synonym for some unsigned integer type. (An unsigned integer variable can contain only whole numbers, no negatives.) It turns out that a consequence of the C++ expression rules is that if k is an int, the loop above might not work correctly for strings over 2 billion characters long, and the compiler might give you a warning about that, phrased as a "signed/unsigned mismatch" or a "comparison of integer expressions of different signedness". Since we won't be using such ridiculously long strings, declaring k to be an int is fine.

Still, it's good practice to try to get a clean build with no warnings. Like the boy who cried wolf, if the compiler gives you many warnings about things that are harmless, you won't notice the warnings you should take seriously. To eliminate the warning you might get, you should declare k to be of the technically proper type:

        for (string::size_type k = 0; k != s.size(); k++)

Most C++ library implementations make size_t synonymous with string::size_type, so you can get away with the somewhat shorter

        for (size_t k = 0; k != s.size(); k++)

Again, you don't have to do this; you can declare k to be an int if you like, but in that case be prepared for possible (harmless) signed/unsigned warnings.

If you do choose to declare k to be of type string::size_type or size_t, you need to be sure that you never try to make k negative. For example, if you try to traverse a string backward, then your saying

        for (string::size_type k = s.size()-1; k >= 0; k--)  // WRONG
        {
            ... s.at(k) ...
        }

would lead to undefined behavior. (If an unsigned integer k is 0 when you execute k--, it will end up with a huge positive value. An unsigned integer is always >= 0, so we execute the loop body and try to talk about a character at a position way past the end of the string.) One correct way to write the loop is

        string::size_type k = s.size();
        while (k > 0)
        {
            k--;
            ... s.at(k) ...
        }

Again, if we choose to make k an int, the for loop version would be fine, but we'd get the (harmless) signed/unsigned warning.