Data Types and Declarations in C++

SoftPrayog

In C++ programs, the types of data elements have to declared before use. The built-in and user defined data types are explained with examples.

Transcript

00:00Hello and welcome to this video. I am Karanesh Jhouri and this video is about

00:06data types and declarations in C++. Data is the most important part in a

00:13program. In this video, we will look at data types, how to declare data elements,

00:20initialization and literals. We will also look at scope and lifetime of

00:27variables. So let's get started with data types and declarations in C++. The

00:36basic difference between computers and other machines is that we use a language

00:41to operate computers. But we are not talking about programming languages or

00:46the shell. Here we are talking about the characters that are input, processed and

00:53output by a computer. So what is the character set that is or can be

00:59processed by a computer? The basic character set is ASCII which is American

01:05standard code for information interchange. This was published as USAS X 3.4 1968 and

01:14later on renamed as NC 3.4 1968. The ISO standard is ISO 646 1983. So what is

01:26ASCII? We can run the ASCII command and we can see that ASCII is a 7-bit character

01:31encoding and it has 2 to the power 7 that is 128 code points 0 through 127. There are

01:41control character code points 0 to 31, then punctuation characters 32 to 47 and then the

01:50digits 48 which is character 0 to 57, character 9 and then once again punctuation or special

01:58characters 58 to 64. Then there are upper case English characters 65 which is upper case A to 90 which

02:09is upper case Z and then 91 to 96 again special characters. Then we have lower case

02:17characters quote point 97 which is lower case A to 122 lower case Z and then 123 to 126 are again

02:29special characters and finally 127 is del a control character. So this is ASCII the basic character set

02:38and is used for storing characters in an 8-bit chart type. Now ASCII uses only 7 bits and there

02:47have been attempts to make extended ASCII using all the 8 bits in a byte for encoding characters

02:54outside the English alphabet. But these attempts have not yielded any standard encoding and it is sort

03:02of non-standard. But the work to encode characters from other languages has continued and we have the

03:09Unicode standard covering symbols from almost all languages in the world. Unicode defines around 154,998

03:20characters of various scripts and it is capable of encoding more than 1.1 million characters. Now

03:29how to convert the code points into bytes which can be stored or transferred over the network. This

03:35is done using the Unicode transformation format UTF and there are three schemes UTF-8, UTF-16 and UTF-32.

03:46UTF-8 uses 1 to 4 bytes per character and the first 128 characters are the same as those in ASCII and

03:55also UTF-8 uses 1 byte for these 128 characters and also the same binary value for the code points as in ASCII. So

04:07UTF-8 is backward compatible with ASCII and is the most popular among the three schemes. UTF-16 uses 2 or 4 bytes

04:16per character and UTF-32 uses a fixed length 4 byte format for each character. To sum up by default,

04:25we work with the ASCII character set but there are provisions to its characters from languages other than English

04:32using one of the UTF schemes.

04:34A token is the smallest unit of a program. If we think of a program as a building, a token is a brick.

04:43It is the smallest sequence of characters recognized as a unit of a program by the compiler.

04:49Tokens are reserve words, user-defined identifiers or names for variables and functions, constant literals

04:57like 7, 3E10, 0777, 0X, 1F, etc. Operators like plus, minus, star, slash, punctuation symbols like

05:12semicolon, braces, square brackets, parenthesis. Then we have the reserve words. Each of these words are

05:20kind of standard etched in stone. They have to be used as they are that is the same spelling and same case.

05:29Each of these keywords have a specific meaning and together with user-defined names and other symbols

05:35they make C++ statements. You don't need to remember all of them all the time but you get more familiar

05:44as you get more experience and they become a part of your vocabulary. And the names.

05:51Names are programmer-defined identifiers for variables and functions. A name is a sequence of letters and

05:57digits. It should start with a letter. The underscore character is considered a letter. There is no restriction

06:04on the length of the length of a name as per C++ but the linkers can't handle arbitrary long names

06:10so it is better to keep names to a reasonable length. Some of the examples of names are item underscore number,

06:19part number, employee underscore record, x1 p2, get user input but the following are not valid names

06:27like 417 underscore x, employee space name, try, try is a reserve word so it can't be a name,

06:36my dot name, triple four, semicolon. All names are case sensitive. Names often comprise of multiple words

06:44so how to concatenate words so that the resulting name is readable. There are multiple conventions.

06:51The first is a snake case. Here the identifier is in the lower case and the words are joined together

06:57using underscore. This is really C style of programming and in C++ mostly used for names of constants

07:05like days underscore in underscore week. Then there is pascal case. The first character of each word is

07:13in uppercase and there is no underscore. The words are simply concatenated. For example,

07:20employee record. This convention is mostly used for names of classes, structures and enums. For example,

07:28employee record. Next is a camel case. This is same as the pascal case except that the first character

07:36is lower case. This is used for variable and function names in C++. For example, new employee and we have

07:44the screaming snake case where all letters are in uppercase and the words are joined with underscore. It is

07:52mostly used for constants and macros. For example, max underscore elements. Data types. The basic data types are

08:01boolean that is bool for holding boolean values. Character types like char, w char t. These are for storing

08:11characters. Integer types like int, long for integer values. Floating point types like double, log double

08:21for storing floating point values. And there is void type which is for indicating non-availability of type

08:30information. These are the basic data types. Then there are three types which use declarative operators

08:39like star, square brackets, ampersand on the above types. These are pointer types like int, star. Array types

08:50for example double, square brackets. Reference types like int, ampersand. Boolean character integer together

08:59are known as integral types. Integral types and floating point together are arithmetic types and all these are

09:08built-in types. These are available by default in C++. Apart from these there are user defined types.

09:16The user defined types are structures and classes that is struct and class. These provide encapsulation of data

09:27and code for abstraction of real life entities or concepts. Then enumeration type

09:35type that is enum and enum class. And these provide a set of values for a user defined type.

09:44Boolean or bool type is a very basic data type and an object of type bool can have either of the two

09:51values false and true. A bool is a result of a condition. For example, int x is equal to minus 5

09:59and we have bool flag equal to x not equal to zero. That is x is non-zero which is true.

10:08Predicates are expressions which evaluate to false or true. For example, here is a function

10:15bool is odd int x return x bitwise and 1 equal to 1. So is odd is a predicate and it returns false or true

10:29for the argument x. Here we are doing bitwise and operation between x and 1. So x is having some value

10:38and 1 has just 1 in the least significant bit. Rest of the bits are 0. The result is 1 if lsb of x is 1

10:48else it is 0. And if the result is 1, x is odd. A variable of type bool is like a small integer

10:55which can take one of the two values false and true. False is represented by 0 and true is equal to 1.

11:04So an integer value of 0 converts to false and any non-zero integer value converts to true.

11:12The comparison of the result of bitwise and with 1 is redundant. We can just return x bitwise and 1.

11:20If the result is 1, x is odd and if it is 0, x is even.

11:26Boolean initialization

11:28How do we initialize a boolean variable? There are two ways.

11:32First, using the assignment operator. For example, bool b1 equal to true.

11:39The second is using braces. For example, bool b2 braces false.

11:46Initialization of variables using braces is better because it is more strict and the compiler flags

11:53narrowing errors. We can initialize using integers. There is a close correspondence between integers

12:00and boolean as 0 converts to false and anything non-zero converts to true. So we can say

12:07bool b1 equal to 0. b1 is false. bool b2 braces 1. b2 is true.

12:17bool b3 equal to 5. 5 is non-zero. So b3 is true. And if we say bool b4 braces 2,

12:26we get the narrowing error. The problem is that when b4 is initialized with 2,

12:31b4 becomes true which is 1. So some data is lost and the value of b4 becomes narrow.

12:40This is the narrowing error. Actually converting from a broad type like int to a small type like boolean

12:48is bit of a problem. So a good way to initialize a bool from integer is int i equal to 2.

12:55bool b4 braces i not equal to 0. Since i is 2 not equal to 0, so b4 is true. Similarly with pointers,

13:05a null pointer converts to false and non-null pointer converts to true. So if we have int i equal to 7,

13:14int star ptr is equal to ampersand i. ptr points to i. bool b5 equal to ptr. Since ptr is not

13:25null pointer, b5 is true and we define another pointer int star ptr1 equal to null pointer. So

13:34if we say bool b6 equal to ptr1, since ptr1 is null pointer, b6 is false. And if we use braces for

13:45initialization and say bool b7 braces ptr, we get the narrowing error. So we can say

13:55bool b7 braces ptr not equal to null pointer and since ptr is not equal to null pointer, b7 is true.

14:05Character types. There are six distinct character types and these are char, signed char, unsigned char,

14:15wchar t, char 16t and char 32t. Char is the basic type of character. It is one 8 bit byte long. Char is

14:26like a small 8 bit integer. It can be signed or unsigned. C++ does not specify whether when we say

14:35char it is signed or unsigned. It depends on the machine architecture and compiler. If char is unsigned,

14:43it can hold a value between 0 to 255 and if it is signed, it holds a value between minus 128 to 127

14:53on 2's complement machine and minus 127 to plus 127 on 1's complement machine. A lot of confusion

15:02can be saved if we just store characters in the char type and not treat it as a small integer type and for

15:12holding arbitrary integers. Then signed char is same as char except that it is guaranteed that the variable

15:20of type of type signed char can hold both positive and negative values and unsigned char is just like char type

15:31except that it is guaranteed that only non-negative values can be stored in variable of type unsigned char.

15:40Wchar t is white char type that supports a bigger character set such as

15:47unicode. Its size is large enough to hold the biggest character in the supported character set.

15:54Then char 16t and char 32t are for 16 and 32 bit characters like utf-16 and utf-32.

16:04So although we have listed 6 types of char, there are actually only 5 because

16:10the first type char is actually the same as either signed char or unsigned char depending upon the machine

16:18architecture and the compiler and also with C++ 20 there is char 8t specifically for handling utf-8 encoded

16:31characters. Now character literals. What are character literals? These are the actual character values

16:40that are written in a program. To specify a single character we just enclose it in single quotes like

16:47single quotes a, b, p, 7 and if we are using the standard ASCII character encoding,

16:56single quotes a means a value of 97 and single quotes b means 98, p means 80,

17:04and 7 means 55 and so on. There are some special control characters which have special notation in

17:13C++ and these are new line or line feed. It is denoted by backslash n or std double colon

17:23e and dl. The corresponding hexadecimal value is 0a. Then carriage return cr,

17:30it is denoted by backslash r and it has the value 0d and going down in the list backslash is denoted by

17:41double backslash and it has a value 5c and single quote is backslash single quote a value of 27 and

17:51double quote is backslash double quote value of 22. We can denote any octal bit pattern in a byte by saying

18:01backslash and the three octal digits and similarly for hexadecimal we can say backslash x and the two

18:11hexadecimal digits. There are three forms of int int, signed int and unsigned int. Ints are always signed

18:20so int and signed int are the same thing. So effectively we have signed int and unsigned int and if you like

18:29you can skip int from the terminology and you just say signed and unsigned there are four sizes of int

18:37which are short int int long int and long long int. As before we can skip int from the integer sizes

18:47with multiple words and just say short int long and long long.

18:53size of short is less than or equal to size of int which is less than or equal to size of long and

19:01that is less than or equal to size of long long. Since the relational operator between the sizes of

19:08different size of integers is less than or equal to there might be less than four distinct sizes of

19:16integers may be long and long long are equal. It depends on the implementation. Integer literals are

19:24integer values that are hard coded in a program. Integers are written in three number systems

19:31decimal, octal and hexadecimal. If a number starts with zero and the next character is not x it is an

19:40octal number. Hexadecimal numbers start with 0x. In case of hexadecimal x can be in either case

19:49it can be lowercase or uppercase and similarly the hexadecimal digits can also be in either case.

19:56If a number does not start with 0 or 0x and has only digits 0 to 9 it is a decimal number.

20:04For example 2178 is a decimal number, 0172 is octal, 0 is octal, 0x48af is hexadecimal and

20:1878912341 is a decimal number. A number like 0178 will result in a compilation error because

20:278 is not an octal digit. Also we have suffixes like u for unsigned, l for long and ll for long long.

20:37The compiler uses the suffix and the literal value to determine the type of the literal.

20:44For example, an integer like 0x7f is considered unsigned long long and since we are specifying

20:55u ll as suffix, the literal 0x7f will be unsigned long long on all machines. So suffixes make the code

21:08portable across different architectures. Floating point types. The floating point types are for

21:16representing real numbers. Real numbers have fractions and these fractions can have infinite precision.

21:25For example, consider irrational numbers like pi, root 2, recurring decimals like 0.3333.

21:34These require infinite precision for accurately representing in the computer. In general,

21:41real numbers require precision and floating point types in the order of increasing precision are

21:47float, double and long double. The double type is mostly used for floating point operations.

21:55Floating point literals. The default type for floating point literals is double. For example,

22:028.12, 0.79, 0.43, 7.11.0, 3.0, e10, 6.626, e-34. All these literals are of type double. There should not be any space

22:22inside of floating point literal. For example, 6.626, e-34 is an error. Now default type for floating point

22:33literal is double. And if you want floating point literal of type float or long double,

22:40you have to put a suffix like f or l respectively. For example, 2.33f, 6.626c-34f,

22:520.89l, 1.0545718171, e-34l. The first two are of type float and the last two are of the type long double.

23:08Void is syntactically a fundamental type. However, there are no objects of type void. Void indicates

23:16that the type of an object is not known. So it works like a placeholder type for such objects. It is

23:24used in pointer declarations where the type of the object pointed to is not known. Also sometimes

23:32functions are for effect only and do not return a value. To indicate that a function does not return

23:40a value. We mark that function of type void. Declarations. All variables and function names

23:49need to be declared before use. We need to specify the type of a name before its use. And sometimes

23:58some additional information is also required in a declaration. For example, char c, char is the type

24:06of name c, int i, j, k. We can declare multiple variables in a single statement, double d, constant

24:14expression double p equal to 2.3, constant double q equal to p, std double colon string str ims string auto l equal

24:27to 5, extern int error number std double colon vector int my vec which is initialized

24:36to the integers 1, 2, 4, 8, 16. A declaration can have 5 parts of which 2 are mandatory and 3 are optional.

24:48First, optional prefix specifiers like static, virtual, extern then a base type like char int double std double

25:00colon vector vector std double colon string string. The base type is mandatory. Then a declarator which contains

25:09a name and optionally some declarator operators. For example, i, fruits, square bracket 7, parenthesis,

25:20star f, parenthesis close and a pair of parenthesis. A declarator is mandatory. Optional function suffix

25:30specifiers like constant, no except, an optional initialization or function body.

25:39Definitions. A definition is a declaration that causes creation of an object. For example, int i. This

25:48declaration is also a definition. When this statement is executed, an integer object is created which can be

25:56referred by the name i. However, consider this statement extern int error underscore no. Here an integer

26:06object is not created. The extern keyword means that error underscore no is defined in some other source

26:14file and here we are just saying that error underscore no name used in this source file is an integer so that

26:24this file can be compiled and object code can be created. The actual error underscore no is defined in some

26:33other source file and is linked to this name at the time of linking and production of the final

26:40executable code. So, extern int error underscore no declaration is not a definition.

26:48Scope. The scope of a name is the part of the program where the name is visible and can be used. The scope of a

26:56name starts from the point it is declared. If it is a global variable, it extends to the end of the file.

27:04If it is a local variable, it starts from the point of declaration and extends till the closing

27:10brace of the block. If a local variable has the same name as a global variable, the local variable hides the

27:17global variable and the local variable is referred by that name. If a global variable is hidden by a

27:23local variable of the same name, we can use the double colon scope resolution operator to bypass the local

27:31variable and access the global variable. If a local variable is hidden by another local variable,

27:38there is no way to access the hidden local variable. For example, in this program,

27:46there is a global variable x which is initialized to 1. Inside the function main,

27:52we define local variable int x. So, when we say x equal to 7, it assigns 7 to the local x. Now,

28:00we have a new block and we have a new local x and this hides the previous local x.

28:07So, x equal to 21 assigns to the new local x and we say double colon x plus 3. The double colon means

28:17global x. It was 1 initially, it becomes 4 now. We print the global x, it prints 4 and we print the local x,

28:27it prints 21. The block gets over and we print x. Now, it is a local x and it prints 7 and the program terminates.

28:37Lifetime. When a program is running, its objects have a lifetime which is less than or equal to

28:45the lifetime of the program. The lifetime of an object starts when it is created and ends when it

28:52is destroyed. The lifetime of variables like char, int, double etc depends on the storage class.

29:01The storage class has two values, automatic and static. Automatic. All variables defined in a function,

29:09unless marked otherwise, have automatic storage class. These objects are created on the stack.

29:16They come into existence when they are defined and their lifetime ends at the closing brace of the function

29:24or block in which they are defined. Static. All global variables have a static storage class.

29:32Also, when a variable defined in a function is marked static, it has static storage class. Objects of static storage

29:42class have the lifetime of the program. Apart from these, objects are created on the free store.

29:50Their lifetime starts at the time of creation using the new operator and ends at the time of deletion

29:58using the delete operator. We have looked at data types and declarations in C++ and also scope

30:07and lifetime of variables. You can find all this information at softpreoc.in website. Here is the QR code

30:17and the link is there in the description. Please subscribe to Softpreoc channel, click on the bell icon

30:24and enable notifications. Thanks very much for watching. Take care and stay safe.

Category

Transcript

Recommended