Data Abstraction in C++

Contributed by: Daniel Perron

This chapter is going to describe how data abstraction is supported in C++. It is an introductory chapter on the subject and should serve as an entry point to a more complete discussion that will be found in the other chapters as they are going to be written.

Support for data abstraction is one half of C++, the other half is support for Object-Oriented Programming. Why someone who design a programming language would like his/her language to support data abstraction? Because it is one of the best approaches to handle the increasing complexity of software. For example, not so long ago, the only tool that people had to develop a GUI for a certain environment was a library of 500 to 800 functions and data structures. To develop the GUI you had to learn how many of these functions had to be called, used, etc. Now this is a daunting task, one function called another which updated a given data structure that resided in an other file and ... you quickly got lost. It was very easy to make errors. This is the problem with procedural programming and the library of functions it supported. When you design a GUI you would like to talk, think and code in term of scroll bars, dialog boxes, windows etc.

This is where data abstraction comes in: data abstraction means the ability to package as a type the data structures and functions which manipulate these data structures. A language supports data abstraction if it is "easy" for the user to do this. The new types defined by the user are called abstract data types (or as Stroustrup likes to call them user-defined types). One of the greatest (the greatest?) achievements of Stroustrup has been to create a language in which the user-defined types have as good (sometimes even better) support than built-in types (in every aspects, type-checking, efficiency, etc).


You can send comments or questions about the material in this chapter by clicking here. The archive of questions and comments made can be found here.
In C++, you can talk, think and code in term of scroll bars. Once you have writen a class scroll_bar {..., you define and manipule scroll bars. I said above the other half of C++ is support for Object-Oriented Programming. Now that C++ has enpowered the user with the ability to create his own types, it would be nice if C++ could spare him to "reinvent the wheel". In some application areas, it is clear that many of the types a user would like to define have a lot in common (GUI or graphics in general is one such area). To avoid rewriting the same code all the time, you try to express the communality among the types as a type (or class) hierarchy where the types in the base of the hierarchy are very general and become more and more specialize has you refine them. If a language makes it "easy" for the user to create an hierarchy of types and refine the behavior of the types in it, then this language is said to support Object-Oriented Programming. (How did I manage to present OOP without even writing once the word "inheritance"? It is probably a rotten presentation, but this chapter is about data abstraction :-)

C++ is a big language but essentially all the features of the language can be traced to support for Data Abstraction and Object-Oriented Programming (and how to do it efficiently using all the help the compiler can give you eg strong typing). I would like to be able to define a Small C++ and forget about its "big brother" but I know it is wishful thinking. If you start with C and you want to add support for DA and OOP and you want to keep the "spirit" of C (which is flexibility and efficiency), you will end up with C++ (or if you are not as talented as Stroustrup you will almost certainly end up with much worst). In the rest of this chapter, I will show you how data abstraction is supported in C++ and introduce many of the features of the language that you have to use if you want to have useful user-defined types.


You can send comments or questions about the material in this chapter by clicking here. The archive of questions and comments made can be found here.
I will create a "toy" type (or class to use C++ terminology), a class that contain only a pointer and introduce many features of C++ to make this "toy" class well-behaved.

Here is the smallest "interesting" class we can start with:

	class String {
	  char * cptr;
	};
class is a reserved word in C++, its purpose is to tell the compiler that we start the description of a user-defined type. Following the word class you have the name of the user-defined type, in our case "String" and this type contains only a pointer to a char. With this declaration you write a program like this (assuming the type String appears earlier in the file which contains the program):

	void main() {
	  String a, b; 
	  b = a;
	};
This program defines two objects of type String a and b and b is assigned the value of String a.

The previous example shows typical uses of a new user-defined type or class in C++. You want to be able to define new objects of this class and do some operations on them. The following syntax was used to define a new object of a class in the previous example: String a (you use the class name followed by an identifier). This tells the compiler to create a new object and make it available. Except in the most trivial cases, you should not rely on the default operations of the compiler to create and initialize your objects. You should add your own constructors and destructors and this is how it's done:

	class String {
	  private:
	   char * cptr;
	  public:
	   String(char *); // String constructor
	   ~String();      // String destructor
	};

You can send comments or questions about the material in this chapter by clicking here. The archive of questions and comments made can be found here.
We've added two keywords private and public: these keywords control the access to the members of the class. The members listed after the keyword public are the only one accessible to a user of your class. The members listed after private are only accessible to the functions declared in the class (and so-called friends). So the pointer cptr is only accessible to the member String(char *) and ~String(). If a member is not listed under any of these keywords (protected is also another possible keyword which can be used in this context) then it is assumed to be private as in the first example. There are now two kinds of members in our class: the data members and the member functions. We are going to give one possible implimentation of the member functions: the constructor String(char *) and destructor ~String().

	String::String(char * a_cptr) {
	  if (a_cptr) {
	    cptr = new char[strlen(a_cptr)+1];
	    strcpy(cptr, a_cptr);
	  } else {
	    cptr = new char[1];
	    cptr[0] = '\0';
	  }
	};
	String::~String() {
	  delete [] cptr;
	};
The constructor and destructor are needed to insure that only well-formed (and behaved) objects are used. Since C++ allows us to define and use arbitrary class, you need a way to express how these objects are created and destroyed and this is the purpose of the constructor and destructor for a given class (a class can have many constructors but a single destructor). In our example, the constructor is passed a char * as argument. If this char * is not the nul pointer then the data member cptr is assigned the address of a new memory location that have enough room for the length of the string passed as argument (strings are implemented as pointer to character in C and are null terminated, this is the reason for the +1 in the code). After memory has been allocated, we call the standard C library function strcpy to copy the content of the string passed as argument to our new object. If the string argument is the null pointer then we allocated enough memory for a single character and copy the null terminating character to it (so that we always have string terminated by a null character).


You can send comments or questions about the material in this chapter by clicking here. The archive of questions and comments made can be found here.
When the object goes out of scope, its destructor is called, in our case, we simple release the memory that was allocated in the corresponding constructor call. We used the [] in the call to delete to ensure the destructor is used for all the objects of the array since we've used new char[...] in the constructor. In this simple case, it would not make a difference, but in the case where the constructor allocated an array of objets, it is a potential memory leak to forget to write the [] in the destructor.

To use the constructor above as a default constructor, we would have to change the signature of the constructor to include a default argument like this:

	class String {
	  private:
	   char * cptr;
	  public:
	   String(char * = 0);  //default constructor
	   ~String();           //destructor
	};
Now let's try, a small example (assuming the code for the String class appears ealier in the file):

	main() {
	  String a("Hello");
	  String b("World");
	
	  b = a;
	  cout << b.cptr << endl;
	  cout << a.cptr << endl;
	};
This program compile without any error (Ouch!!! I must have been dreaming. A chance someone woke me up, check here for details). But there is a major problem with this program: after the assignment statement b = a; b points to the string "Hello" (we have copy pointers) and the memory allocated for "World" is unreachable now. Moreover, if a would happen to go out of scope before b (which is not the case in this example) the destructor for a would make the pointer in b invalid also. You definitely don't want this kind of behaviours from your objects. What you need is to define the meaning of an assignment for your objects and this is done by coding an assignment operator. The new String class becomes:


You can send comments or questions about the material in this chapter by clicking here. The archive of questions and comments made can be found here.
	class String {
	  private:
	   char * cptr;
	  public:
	   String(char * = 0);  //default constructor
	   ~String();           //destructor
	   String & operator=(const & from); //assignment operator
	};
And the assignment operator can be coded like this:

	String::String & operator=(const String & from) {
	  if (this == &from)
	    return * this;
	  else {
	    delete [] cptr;
	    cptr = new char[strlen(from.cptr)+1];
	    strcpy(cptr, from.cptr);
	    return * this;
	  }
	};
Let's look at what is going on here. We'll return to the function signature a bit later. First the member function operator=(...) (which is just an ordinary member function which uses the syntax = as its name, in C++ you can overload (ie give a different meaning) to most operators) checks to see if we are not trying to assign a variable to itself, if it is the case then you return immediatly because the normal behaviour of the assignment operator should be to release the memory for the object assigned to (the statement delete [] cptr; above) and copy the content of the from object to the object assigned to. If we don't check for self-assignment then we have deleted the memory just before we need it. Using C++, you are often in a situation where you try to make reference to the actual object you are writing the code for (the object "assigned to" in the previous discussion). To handle such a situation C++ provides a constant this pointer which always points to "this" object.

Now let's look at this function signature, it takes a single argument of type "reference to a constant String" and return a "reference to a String". The type of the argument is relatively simple to justify. C++ and C pass argument "by value" as a default rule. When you pass an argument to a function "by value", it means the function receives a copy of the actual value of the argument when the function is called. When dealing with large objects it is inefficient to use this method of communication with functions. C allows you to use pointer if you are not satisfied with the default "pass by value". C++ has introduced a third way of passing argument: "by reference". It is as efficient as passing argument using a pointer but it avoids the complications that arise from the unappealing syntax of pointers when dealing with functions overloading. Since we don't expect the value of the argument to an assignment operator to change during the assignment operation, we declared the parameter to be constant (we'll have more to say about the brave new world of constant objects later). The assignment operator is declared to return a reference to a String object. We have to do this if we want to be allowed to write multiple assignments in the same statement as in a = b = c;. This is translated in the following equivalent form by the compiler: a.operator=(b. operator=(c)). When written in this form, we see that the return value of the function b.operator(c) has to be used as input to the function a.operator=(). One way of doing this is to have the operator=() return a reference to a String object.


You can send comments or questions about the material in this chapter by clicking here. The archive of questions and comments made can be found here.
Since C++ passes arguments "by value" as a default rule. It is useful to be able to say precisely what you want to be copied to the function when a function is called. This is the job of the copy constructor. As a rule, it is wise to write a copy constructor for any class that contains reference to memory dynamically allocated and your application is going to use many object of that class. The copy constructor is called by the compiler when: you pass argument "by value" to a function, when you return an object "by value" from a function and when you initialize one object with the value of another.

For our "toy" String class, the copy constructor will be declared like this and can be coded as follow:

	String::String(const String &from) {
	  cptr = new char[strlen(from.cptr)+1];
	  strcpy(cptr, from.cptr);
	};
Our String class declaration now looks like this:

	class String {
	  private:
	   char * cptr;
	  public:
	   String(char * = 0);     //default constructor
	   String(const String &); //copy constructor
	   ~String();              //destructor
	   String & operator=(const String &); //assignment operator
	};
It would be nice if we could add two String objects together and get their concatenation as a result. We would like to be able to write a segment of code like this:

	String a = "Hello";
	String b = " World";
	String c; 
	String d = a + b;  // d contains "Hello World"
If we try to write the operator+ has a member function of the class String, we would end up with the following problem:

	String::String & operator+(const String &); //assuming 
                             //this declaration of operator+ as a member
	String a = "Hello";
	String b = " World";
	String c = a; 
	String d = c + " World"; //works fine
	String c = b;
	String d = "Hello" + c; //error

You can send comments or questions about the material in this chapter by clicking here. The archive of questions and comments made can be found here.
Because when we write String d = c + " World"; c + " World" is translated as c.operator+(" World") and this is fine since " World" would be converted to a String with the default constructor and pass as a reference to the member function operator+. But "Hello" + c has no possible interpretation, we cannot add a new operator to a built-in type. So if we want a symmetric behaviour from our operator+, we have to declare it a global function and since this function is going to require access to the private member of the String class, we have to make it a friend:

	class String {
	friend String operator+(const String &, const String &);  
	  private:
	   char * cptr;
	  public:
	   String(char * = 0);     //default constructor
	   String(const String &); //copy constructor
	   ~String();              //destructor
	   String & operator=(const String &); //assignment operator
	};
And operator+(...) can be coded as follow:
	String operator+(const String & a, const String & b) { 
	  String tmp;
	  delete [] tmp.cptr;
	  tmp.cptr = new char[strlen(a.cptr)+strlen(b.cptr)+1];
	  strcpy(tmp.cptr, a.cptr);
	  strcat(tmp.cptr, b.cptr);
	  return tmp;
	};
We pass the two arguments as const String & to avoid some inefficient copying (in this toy example, it wouldn't matter much but for a real class it might be otherwise) but we return the function result by value! Since we have no guarantee that an object that contains the concatenation of the two arguments already exists (and this is the meaning of a reference: a new name for an already existing object). If you try to return a reference to tmp then the value returned by operator+ will be undefined because when tmp goes out of spope (which is when control returns to the caller), its destructor is called.


You can send comments or questions about the material in this chapter by clicking here. The archive of questions and comments made can be found here.
It is reasonable to expect a function which takes a "char *" as argument to be able to perform the same service for an object of type String. At the moment, this is impossible, the compiler will not let you do it. If you have a library of functions which take argument of type "char *" then you are essentially required to rewrite these functions to take arguments of type String. This is not nice so C++ provide a way around this problem conversion operator. This is defined and implemented as follow:

	class String {
	friend String operator+(const String &, const String &);  
	  private:
	   char * cptr;
	  public:
	   String(char * = 0);     //default constructor
	   String(const String &); //copy constructor
	   ~String();              //destructor
	   operator char * () { return cptr;} ; //conversion String to char *
	   String & operator=(const String &); //assignment operator
	};	
So from now on, when the compiler is in the following situation:

	extern int strlen(char *);

	String a = "Hello";
	int x = strlen(a); // this is equivalent to strlen(a.operator char *())
It will convert the String object a to a pointer to a char using the operation provided by operator char * ().

A natural operation on objects of type String is to try to access some of its characters. We will implement this operation for our class String, doing so, we will venture just a bit in the "brave new world" of const object in C++. Accessing the character of a String object will be implemented using the indexing operator[]. This is how it is implemented.

	char & String::operator[] (int i) { return cptr[i]; } 
                                              //no range checking!
If you try to define a constant in C, you are going to use #define. #define is handled by the C preprocessor and the compiler is never going to see the name of the constant. Now C++ is a strongly-typed language, so you want to try to enroll the help of your compiler to make safe and legal uses of a constant, therefore C++ introduced the const keyword to handle such a case. So in a C++ program, you are going to see const int x = 5; to initialize a constant integer to 5, and after this statement is executed, you expect to compiler to warn you about an attempt to change this value. C++ hates to make difference between built-in types and user-defined types, therefore in C++, you are allowed to define a variables of any type to be a constant. If an object can be make constant then you should be able to express the fact that a particular member function is to be called on a constant object. You have to be consistent, once you have enrolled the help of the compiler to check "constness" of an integer, you are leaded to accept the fact that constant member functions should also be supported. The problem with support for constant objects is the gap between what you and me would like to see constant and what the compiler can effectively checked. Let's see a simple example. In the previous example, we return the caracter pointed by cptr[i], so it is reasonable to declare this function constant, like this (note the added const keyword after the declaration.


You can send comments or questions about the material in this chapter by clicking here. The archive of questions and comments made can be found here.
	char & String::operator[] (int i) const { return cptr[i]; }
Now, if we have the following segment of code:

	const String a = "Hello";
	a[0] = 'z'; //the compiler won't complain, a = "zello" now
What happened here? We have a constant object and the compiler let us changes the value. No, the bits inside the object a are not changed, the String object contains only a pointer and the value of this pointer is not changed. The compiler did its job. If you think this behavior is allowed because operator[] returns a reference, you are wrong. If it would return its result "by value", the previous example would not even compile. The way to fix this problem is to change the return type of operator[] to const char & a reference to a constant char. This way the compiler will not let you write a character to a reference to a constant character. And the previous segment of code will generate an error. It is easy to go over your code after learning the "benefits" of constant and add const here and there where it obviously makes sense. As the previous example showed, it is also very easy to make mistakes this way. Does the extra effort needed to learn the discipline of const worth it? I let you answer this. As far as I am concerned, C++ is consistent, C++ is strongly typed and treat user-defined types as "first class citizen". If I prefer to use const int over #define then I have to be willing to deal with constant member functions and the likes.

If you have modified your operator[] as mentioned above, you are confident that the compiler will not let you change the value of the individual character in a constant String. This is all well but now, if you don't define an operator[] for a non-constant String the const operator[] will be called on it when you try to read the caracter of a non-constant String and you'll get the following situation:

	String a = "Hello";
	a[0] = 'z'; //error, you are not allowed to change a constant object!
If there is no operator[] defined for a non-constant object then a constant member function will do the job (it is safe to call a constant member function on a non-constant object, it is the other way around the compiler would not appreciate). Because the constant operator[] returns a reference to a constant character, you cannot change it. So you have no choice, you have to define operator[] for non-constant String, like this for example:

	char & String::operator[] (int i) { return cptr[i]; } 
                                         //range checking, please
Now that you have some feelings for the kind of support you can get from the compiler, you take a look at our String class as defined so far and you wonder if it wouldn't be a good idea to have the conversion operator defined to convert to a constant character string instead of a character string as defined right now and you propose to do the following change:

	operator const char * () { return cptr; } // instead of operator char *
This is a very good idea because, as defined earlier, the conversion operator was essentially a loop hole in the protection given to the field cptr, it wasn't private anymore since anyone could change the string pointed to like this, for example:

	String a = "Hello";
	char * any_ptr = a;
	any_ptr[0] = 'z'; // a = "zello"

You can send comments or questions about the material in this chapter by clicking here. The archive of questions and comments made can be found here.
The whole idea of data hiding which is: trying to "localize" the changes to values of a field of a data structure to a few well defined functions and supported in C++ with features like private data members of a class is completely defeated by writing a public member function which returns a pointer (or a reference) to a private field. By defining the conversion operator to return a constant string this problem is avoided.

We'll finish our presentation of data abstraction in C++ with a short example on Streams and how to use them to be able to input and output objects of String type. This discussion is a bit artificial because, given the conversion operator above we can output a String object since << knows how to display the built-in types and a String object can be converted to a built-in type (using the conversion operator just described). So we can display a String objects as easily as it would have been an object of a built-in type. As far as reading String objects from the keyboard, this is where we have to admit that our String class is a "toy" class just created to introduce the many features of C++ to support data abstraction. There is no way I can define the length of a String "a priori" and it is probably the reason why there is no built-in type string in C. So I won't show you how to read String objects from the keyboard. But I will define a new operator<< that can be adapted to your own situation.

If you try to define operator<< as a member function of the class String, you will end up having to do this:

	Sting a = "Hello";
	a << cout; // "Hello" is sent to the standard output device
Look carefully, this is supposed to be the other way around but since the left operand of every operator member functions has to be an object of the class itself, you have to use this syntax (a << cout is the same as a.operator<<(cout)). This is pretty confusing so you decide to make operator<< a global function and since it will need access to the data member of the String class, you have to make it a friend. A possible implementation is as follow:

	ostream & operator<< (ostream & output, const String & a) {
	  return ouput << a.cptr;
	};
It returns a reference for the same reason the operator= was defined to return a reference (you want to be able to write cout << a << b << endl;). cout is defined to be an object of the class ostream and this operator<< will allow you to send a String object to any object of type ostream.


You can send comments or questions about the material in this chapter by clicking here. The archive of questions and comments made can be found here.
Let's have a final look at our "toy" String class.

class String {
  private:
   char * cptr;
  public:
   String(char * = 0);     //default constructor
   String(const String &); //copy constructor
   ~String();              //destructor
   operator const char * () { return cptr;} ; //conversion from String 
                                              //to a const char *
   String & operator=(const String &); //assignment operator
   char & operator[] (int i) { return cptr[i]; } //indexing operator
   const char & operator[] (int i) const { return cptr[i]; } //same
friend String operator+(const String &, const String &); //concatenation 
friend ostream & operator<< (ostream & output, const String & a); 
//friend istream & operator>> (istream & output, const String & a);
//                            left as an exercise :-)
};
Amazing? Intimidating? How can a simple class like a String with only a pointer to a character as a member leads us to something so complicated? It is not amazing, Stroustrup started with C and was determined to keep its efficiency but wanted to provide user-defined types the same kind of support the compiler can provide for built-in types. Even though the class looks intimidating at the beginning, it quickly become routine to write the skeleton once you have grasped the meaning of each part. Is it too complicated? A C programmer would argue that typedef char * String accomplish almost the same thing as the "big" String class above. This is true for a "toy" class like our String class but it is *definitely not* true in general. Stroustrup reports in his latest book that the most frequent criticism of C++ is "the language is too big". I have tried to convince you that starting with C and trying to create and efficient and strongly typed language which support data abstraction and object-oriented programming, you end up with C++.

Once you have digested and mastered this introduction plus all the concepts that it touched only superficially, you can be sure that you have assimilated half of C++, the other half is support for object-oriented programming that is introduced in the next chapter.


You can send comments or questions about the material in this chapter by clicking here. The archive of questions and comments made can be found here.
    Copyright (C) 1994  Daniel Perron

    This program is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. 
See the GNU General Public License for more details.