Monday, September 5, 2011

INMODs with FastLoad


Using INMODs with FastLoad

When you find that FastLoad does not read the file type you have or you wish to control the access for any reason, then it might be desirable to use an INMOD. An INMOD (Input Module), is fully compatible with FastLoad in either mainframe or LAN environments, providing that the appropriate programming languages are used. However, INMODs replace the normal mainframe DDNAME or LAN defined FILE name with the following statement: DEFINE INMOD=<INMOD-name>. For a more in-depth discussion of INMODs, see the chapter of this book titled “INMOD Processing”.

RESTARTING Fastload In-Depth


Restarting FastLoad — A More In-Depth Look

How the CHECKPOINT Option Works

CHECKPOINT option defines the points in a load job where the FastLoad utility pauses to record that Teradata has processed a specified number of rows. When the parameter “CHECKPOINT [n]” is included in the BEGIN LOADING clause the system will stop loading momentarily at increments of [n] rows.
At each CHECKPOINT, the AMPs will all pause and make sure that everything is loading smoothly. Then FastLoad sends a checkpoint report (entry) to the SYSADMIN.Fastlog table. This log contains a list of all currently running FastLoad jobs and the last successfully reached checkpoint for each job. Should an error occur that requires the load to restart, FastLoad will merely go back to the last successfully reported checkpoint prior to the error. It will then restart from the record immediately following that checkpoint and start building the next block of data to load. If such an error occurs in Phase 1, with CHECKPOINT 0, FastLoad will always restart from the very first row.

Restarting with CHECKPOINT

Sometimes you may need to restart FastLoad. If the FastLoad script requests a CHECKPOINT (other than 0), then it is restartable from the last successful checkpoint. Therefore, if the job fails, simply resubmit the jobHere are the two options: Suppose Phase 1 halts prematurely; the Data Acquisition phase is incomplete. Resubmit the FastLoad script. FastLoad will begin from RECORD 1 or the first record past the last checkpoint. If you wish to manually specify where FastLoad should restart, locate the last successful checkpoint record by referring to the SYSADMIN.FASTLOG table. To specify where a restart will start from, use the RECORD command. Normally, it is not necessary to use the RECORD command — let FastLoad automatically determine where to restart from.
If the interruption occurs in Phase 2, the Data Acquisition phase has already completed. We know that the error is in the Application Phase. In this case, resubmit the FastLoad script with only the BEGIN and END LOADING Statements. This will restart in Phase 2 with the sort and building of the target table.

Restarting without CHECKPOINT (i.e., CHECKPOINT 0)

When a failure occurs and the FastLoad Script did not utilize the CHECKPOINT (i.e., CHECKPOINT 0), one procedure is to DROP the target table and error tables and rerun the job. Here are some other options available to you:
Resubmit job again and hope there is enough PERM space for all the rows already sent to the unsorted target table plus all the rows that are going to be sent again to the same target table. Other than using space, these rows will be rejected as duplicates. As you can imagine, this is not the most efficient way since it processes many of the same rows twice.
If CHECKPOINT wasn’t specified, then CHECKPOINT defaults to 100,000. You can perform a manual restart using the RECORD statement. If the output print file shows that checkpoint 100000 occurred, use something like the following command: [RECORD 100001;]. This statement will skip records 1 through 10000 and resume on record 100001.

FastLoad Finishes


What Happens When FastLoad Finishes

You Receive an Outcome Status

The most important thing to do is verify that FastLoad completed successfully. This is accomplished by looking at the last output in the report and making sure that it is a return code or status code of zero (0). Any other value indicates that something wasn’t perfect and needs to be fixed.
The locks will not be removed and the error tables will not be dropped without a successful completion. This is because FastLoad assumes that it will need them for its restart. At the same time, the lock on the target table will not be released either. When running FastLoad, you realistically have two choices once it is started. First choice is that you get it to run to a successful completion, or lastly, rerun it from the beginning. As you can imagine, the best course of action is normally to get it to finish successfully via a restart.

You Receive a Status Report

What happens when FastLoad finishes running? Well, you can expect to see a summary report on the success of the load. Following is an example of such a report.

Line 1:
Line 2:
Line 3:
Line 4:
Line 5:
TOTAL RECORDS READ = 1000000
TOTAL ERRORFILE1 = 50
TOTAL ERRORFILE2 = 0
TOTAL INSERTS APPLIED = 999950
TOTAL DUPLICATE ROWS = 0
Figure 4-7
The first line displays the total number of records read from the input file. Were all of them loaded? Not really. The second line tells us that there were fifty rows with constraint violations, so they were not loaded. Corresponding to this, fifty entries were made in the first error table. Line 3 shows that there were zero entries into the second error table, indicating that there were no duplicate Unique Primary Index violations. Line 4 shows that there were 999950 rows successfully loaded into the empty target table. Finally, there were no duplicate rows. Had there been any duplicate rows, the duplicates would only have been counted. They are not stored in the error tables anywhere. When FastLoad reports on its efforts, the number of rows in lines 2 through 5 should always total the number of records read in line 1.
Note on duplicate rows: Whenever FastLoad experiences a restart, there will normally be duplicate rows that are counted. This is due to the fact that a error seldom occurs on a checkpoint (quiet or quiescent point) when nothing is happening within FastLoad. Therefore, some number of rows will be sent to the AMPs again because the restart starts on the next record after the value stored in the checkpoint. Hence, when a restart occurs, the first row after the checkpoint and some of the consecutive rows are sent a second time. These will be caught as duplicate rows after the sort. This restart logic is the reason that FastLoad will not load duplicate rows into a MULTISET table. It assumes they are duplicates because of this logic.

You Can Troubleshoot

In the example above, we know that the load was not entirely successful. But that is not enough. Now we need to troubleshoot in order identify the errors and correct them. FastLoad generates two error tables that will enable us to find the culprits. The first error table, which we named Errorfile1, contains just three columns: The column ErrorCode contains the Teradata FastLoad code number to a corresponding translation or constraint error. The second column, named ErrorField, specifies which column in the table contained the error. The third column, DataParcel, contains the row with the problem. Both error tables contain the same three columns; they just track different types of errors.
As a user, you can select from either error table. To check errors in Errorfile1 you would use this syntax:
image\img00386.gif
Corrected rows may be inserted to the target table using another utility that does not require an empty table.
To check errors in Errorfile2 you would the following syntax:
image\img00011.gif
The definition of the second error table is exactly the same as the target table with all the same columns and data types.

RESTARTING FastLoad


When You Cannot RESTART FastLoad

There are two types of FastLoad scripts: those that you can restart and those that you cannot without modifying the script. If any of the following conditions are true of the FastLoad script that you are dealing with, it is NOT restartable:
·       The Error Tables are DROPPED
·       The Target Table is DROPPED
·       The Target Table is CREATED
Why might you have to RESTART a FastLoad job, anyway? Perhaps you might experience a system reset or some glitch that stops the job one half way through it. Maybe the mainframe went down. Well, it is not really a big deal because FastLoad is so lightning-fast that you could probably just RERUN the job for small data loads.
However, when you are loading a billion rows, this is not a good idea because it wastes time. So the most common way to deal with these situations is simply to RESTART the job. But what if the normal load takes 4 hours, and the glitch occurs when you already have two thirds of the data rows loaded? In that case, you might want to make sure that the job is totally restartable. Let’s see how this is done.



When You Can RESTART FastLoad

If all of the following conditions are true, then FastLoad is ALWAYS restartable:
·       The Error Tables are NOT DROPPED in the script
·       The Target Table is NOT DROPPED in the script
·       The Target Table is NOT CREATED in the script
·       You have defined a checkpoint
So, if you need to drop or create tables, do it in a separate job using BTEQ. Imagine that you have a table whose data changes so much that you typically drop it monthly and build it again. Let’s go back to the script we just reviewed above and see how we can break it into the two parts necessary to make it fully RESTARTABLE. It is broken up below.
STEP ONE: Run the following SQL statements in Queryman or BTEQ before you start FastLoad:

DROP TABLE SQL01.Department;
DROP TABLE SQL01.Dept_Err1;
DROP TABLE SQL01.Dept_Err2;
DROPS TARGET TABLE AND ERROR TABLES
image\img00385.gif
CREATES THE DEPARTMENT TARGET TABLE IN THE SQL01 DATA BASE IN TERADATA
Figure 4-6
First, you ensure that the target table and error tables, if they existed previously, are blown away. If there had been no errors in the error tables, they would be automatically dropped. If these tables did not exist, you have not lost anything. Next, if needed, you create the empty table structure needed to receive a FastLoad.
STEP TWO: Run the FastLoad script
This is the portion of the earlier script that carries out these vital steps:
·       Defines the structure of the flat file
·       Tells FastLoad where to load the data and store the errors
·       Specifies the checkpoint so a RESTART will not go back to row one
·       Loads the data
If these are true, all you need do is resubmit the FastLoad job and it starts loading data again with the next record after the last checkpoint. Now, with that said, if you did not request a checkpoint, the output message will normally indicate how many records were loaded.
You may optionally use the RECORD command to manually restart on the next record after the one indicated in the message.
Now, if the FastLoad job aborts in Phase 2, you can simply submit a script with only the BEGIN LOADING and END LOADING. It will then restart right into Phase 2.

Converting Data Types with FastLoad


Converting Data Types with FastLoad

Converting data is easy. Just define the input data types in the input file. Then, FastLoad will compare that to the column definitions in the Data Dictionary and convert the data for you! But the cardinal rule is that only one data type conversion is allowed per column. In the example below, notice how the columns in the input file are converted from one data type to another simply by redefining the data type in the CREATE TABLE statement.
FastLoad allows six kinds of data conversions. Here is a chart that displays them:
IN FASTLOAD YOU MAY CONVERT
CHARACTER DATA
TO
NUMERIC DATA
FIXED LENGTH DATA
TO
VARIABLE LENGTH DATA
CHARACTER DATA
TO
DATE
INTEGERS
TO
DECIMALS
DECIMALS
TO
INTEGERS
DATE
TO
CHARACTER DATA
NUMERIC DATA
TO
CHARACTER DATA
Figure 4-4
When we said that converting data is easy, we meant that it is easy for the user. It is actually quite resource intensive, thus increasing the amount of time needed for the load. Therefore, if speed is important, keep the number of columns being converted to a minimum!

Sample FastLoad Script


Sample FastLoad Script

Let’s look at an actual FastLoad script that you might see in the real world. In the script below, every comment line is placed inside the normal Teradata comment syntax, [/*. . . . */]. FastLoad and SQL commands are written in upper case in order to make them stand out. In reality, Teradata utilities, like Teradata itself, are by default not case sensitive. You will also note that when column names are listed vertically we recommend placing the comma separator in front of the following column. Coding this way makes reading or debugging the script easier for everyone. The purpose of this script is to update the Employee_Profile table in the SQL01 database. The input file used for the load is named EMPS.TXT. Below the sample script each step will be described in detail.
Normally it is not a good idea to put the DROP and CREATE statements in a FastLoad script. The reason is that when any of the tables that FastLoad is using are dropped, the script cannot be restarted. It can only be rerun from the beginning. Since FastLoad has restart logic built into it, a restart is normally the better solution if the initial load attempt should fail. However, for purposes of this example, it shows the table structure and the description of the data being read.

/* !/bin/ksh*  */
/* ++++++++++++++++++++++++++++*/
/* FASTLOAD SCRIPT TO LOAD THE                 */
/* Employee_Profile TABLE                              */
/* Version 1.1                                                 */
/* Created by Coffing Data Warehousing          */
/* ++++++++++++++++++++++++++++*/

/* Setup the FastLoad Parameters */
Runs from a shell script.
Always good to identify the script and author in comments.
Since this script does not drop the target or error tables, it is restartable. This is a good thing for production jobs.
SESSIONS 100; /*or, the number of sessions supportable*/
Specify the number of sessions to logon.
TENACITY 4; /* the default is no tenacity, means no retry */
SLEEP 10; /* the default is 6, means retry in 6 minutes */

LOGON CW/SQL01,SQL01;
Tenacity is set to 4 hr;
Wait 10 Min between retries.
SHOW VERSIONS; /* Shows the Utility’s release number */
Display the version of FastLoad.
/* Set the Record type to a comma delimited for FastLoad */ RECORD 2;
Starts with the second record.
SET RECORD VARTEXT ‘,’;
Specifies if record layout is vartext with a comma delimiter.
image\img00383.gif
Notice that all fields are defined as VARCHAR. When using VARTEXT, the fields do not contain the length field like in these formats: text, FastLoad, or unformatted.
FILE= EMPS.TXT;
Defines the flat file name.
/* Optional to show the layout of the input */ SHOW
Specifies table to load and lock.
/* Begin the Load and Insert Process into the */
/* Employee_Profile Table */

BEGIN LOADING SQL01.Employee_Profile
ERRORFILES SQL01.Emp_Err1, SQL01.Emp_Err2
CHECKPOINT 100000;
Names the error tables. Sets the number of rows at which to pause & record progress in the restart log before loading further.
image\img00384.gif
Defines the insert statement to use for loading the rows.
END LOADING;
Continues loading process with Phase 2.
LOGOFF;
Logs off of Teradata.
Figure 4-3
Step One: Before logging onto Teradata, it is important to specify how many sessions you need. The syntax is [SESSIONS {n}].
Step Two: Next, you LOGON to the Teradata system. You will quickly see that the utility commands in FastLoad are similar to those in BTEQ. FastLoad commands were designed from the underlying commands in BTEQ. However, unlike BTEQ, most of the FastLoad commands do not allow a dot [“.”] in front of them and therefore need a semi-colon. At this point we chose to have Teradata tell us which version of FastLoad is being used for the load. Why would we recommend this? We do because as FastLoad’s capabilities get enhanced with newer versions, the syntax of the scripts may have to be revisited.
Step Three: If the input file is not a FastLoad format, before you describe the INPUT FILE structure in the DEFINE statement, you must first set the RECORD layout type for the file being passed by FastLoad. We have used VARTEXT in our example with a comma delimiter. The other options are FastLoad, TEXT, UNFORMATTED OR VARTEXT. You need to know this about your input file ahead of time.
Step Four: Next, comes the DEFINE statement. FastLoad must know the structure and the name of the flat file to be used as the input FILE, or source file for the load.
Step Five: FastLoad makes no assumptions from the DROP TABLE statements with regard to what you want loaded. In the BEGIN LOADING statement, the script must name the target table and the two error tables for the load. Did you notice that there is no CREATE TABLE statement for the error tables in this script? FastLoad will automatically create them for you once you name them in the script. In this instance, they are named “Emp_Err1” and “Emp_Err2”. Phase 1 uses “Emp_Err1” because it comes first and Phase 2 uses “Emp_Err2”. The names are arbitrary, of course. You may call them whatever you like. At the same time, they must be unique within a database, so using a combination of your userid and target table name helps insure this uniqueness between multiple FastLoad jobs occurring in the same database.
In the BEGIN LOADING statement we have also included the optional CHECKPOINT parameter. We included [CHECKPOINT 100000]. Although not required, this optional parameter performs a vital task with regard to the load. In the old days, children were always told to focus on the three “R’s’ in grade school (“reading, ‘riting, and ‘rithmatic”). There are two very different, yet equally important, R’s to consider whenever you run FastLoad. They are RERUN and RESTART. RERUN means that the job is capable of running all the processing again from the beginning of the load. RESTART means that the job is capable of running the processing again from the point where it left off when the job was interrupted, causing it to fail. When CHECKPOINT is requested, it allows FastLoad to resume loading from the first row following the last successful CHECKPOINT. We will learn more about CHECKPOINT in the section on Restarting FastLoad.
Step Six: FastLoad focuses on its task of loading data blocks to AMPs like little Yorkshire terrier’s do when playing with a ball! It will not stop unless you tell it to stop. Therefore, it will not proceed to Phase 2 without the END LOADING command.
In reality, this provides a very valuable capability for FastLoad. Since the table must be empty at the start of the job, it prevents loading rows as they arrive from different time zones. However, to accomplish this processing, simply omit the END LOADING on the load job. Then, you can run the same FastLoad multiple times and continue loading the worktables until the last file is received. Then run the last FastLoad job with an END LOADING and you have partitioned your load jobs into smaller segments instead of one huge job. This makes FastLoad even faster!
Of course to make this work, FastLoad must be restartable. Therefore, you cannot use the DROP or CREATE commands within the script. Additionally, every script is exactly the same with the exception of the last one, which contains the END LOADING causing FastLoad to proceed to Phase 2. That’s a pretty clever way to do a partitioned type of data load.
Step Seven: All that goes up must come down. And all the sessions must LOGOFF. This will be the last utility command in your script. At this point the table lock is released and if there are no rows in the error tables, they are dropped automatically. However, if a single row is in one of them, you are responsible to check it, take the appropriate action and drop the table manually.

FastLoad Example


A FastLoad Example in its Simplest Form

The load utilities often scare people because there are many things that appear complicated. In actuality, the load scripts are very simple. Think of FastLoad as:
·       Logging onto Teradata
·       Defining the Teradata table that you want to load (target table)
·       Defining the INPUT data file
·       Telling the system to start loading
This first script example is designed to show FastLoad in its simplest form. The actual script is in the left column and our comments are on the right.
Logon CDW/jones, cowboys;
LOGON TO TERADATA
image\img00378.gif
Creates the department target table in the sql101 database in Teradata
/* in this sample script, the create shows what the table looks like, however, this is not a good practice in a production script */
image\img00379.gif
Defines the fields in the record for the flat file being read and FILE= provides the name the input file
image\img00380.gif
Specifies table to load for locking purposes, names the error tables and sets the checkpoint for restart processing to 15000.
image\img00381.gif
The INSERT used to populate the table
END LOADING;
Tells FastLoad to begin Phase 2
LOGOFF;
Disconnects the Teradata sessions

FastLoad Commands


FastLoad Commands

Here is a table of some key FastLoad commands and their definitions. They are used to provide flexibility in control of the load process. Consider this your personal redi-reference guide! You will notice that there are only a few SQL commands that may be used with this utility (Create Table, Drop Table, Delete and Insert). This keeps FastLoad from becoming encumbered with additional functions that would slow it down.
AXSMOD
Short for Access Module, this command specifies input protocol like OLE-DB or reading a tape from REEL Librarian. This parameter is for network-attached systems only. When used, it must precede the DEFINE command in the script.
BEGIN LOADING
This identifies and locks the FastLoad target table for the duration of the load. It also identifies the two error tables to be used for the load. CHECKPONT and INDICATORS are subordinate commands in the BEGIN LOADING clause of the script. CHECKPOINT, which will be discussed below in detail, is not the default for FastLoad. It must be specified in the script. INDICATORS is a keyword related to how FastLoad handles nulls in the input file. It identifies columns with nulls and uses a bitmap at the beginning of each row to show which fields contain a null instead of data. When the INDICATORS option is on, FastLoad looks at each bit to identify the null column. The INDICATORS option does not work with VARTEXT.
CREATE TABLE
This defines the target table and follows normal syntax. If used, this should only be in the initial script. If the table is being loaded, it cannot be created a second time.
DEFINE
This names the Input file and describes the columns in that file and the data types for those columns.
DELETE
Deletes all the rows of a table. This will only work in the initial run of the script. Upon restart, it will fail because the table is locked.
DROP TABLE
Drops a table and its data. It is used in FastLoad to drop previous Target and error tables. At the same time, this is not a good thing to do within a FastLoad script since it cancels the ability to restart.
END LOADING
Success! This command indicates the point at which that all the data has been transmitted. It tells FastLoad to proceed to Phase II. As mentioned earlier, it can be used as a way to partition data loads to the same table. This is true because the table remains empty until after Phase II.
ERRLIMIT
Specifies the maximum number of rejected ROWS allowed in error table 1 (Phase I). This handy command can be a lifesaver when you are not sure how corrupt the data in the Input file is. The more corrupt it is, the greater the clean up effort required after the load finishes. ERRLIMIT provides you with a safety valve. You may specify a particular number of error rows beyond which FastLoad will immediately precede to the abort. This provides the option to restart the FastLoad or to scrub the input data more before loading it. Remember, all the rows in the error table are not in the data table. That becomes your responsibility.
HELP
Designed for online use, the Help command provides a list of all possible FastLoad commands along with brief, but pertinent tips for using them.
HELP TABLE
Builds the table columns list for use in the FastLoad DEFINE statement when the data matches the Create Table statement exactly. In real life this does not happen very often.
INSERT
This is FastLoad’s favorite command! It inserts rows into the target table.
LOGON/LOGOFF or, QUIT
No, this is not the WAX ON / WAX OFF from the movie, The Karate Kid! LOGON simply begins a session. LOGOFF ends a session. QUIT is the same as LOGOFF.
NOTIFY
Just like it sounds, the NOTIFY command used to inform the job that follows that some event has occurred. It calls a user exit or predetermined activity when such events occur. NOTIFY is often used for detailed reporting on the FastLoad job’s success.
RECORD
Specifies the beginning record number (or with THRU, the ending record number) of the Input data source, to be read by FastLoad. Syntactically, This command is placed before the INSERT keyword. Why would it be used? Well, it enables FastLoad to bypass input records that are not needed such as tape headers, manual restart, etc. When doing a partition data load, RECORD is used to over-ride the checkpoint. What does this mean???
SET RECORD
Used only in the LAN environment, this command states in what format the data from the Input file is coming: FastLoad, Unformatted, Binary, Text, or Variable Text. The default is the Teradata RDBMS standard, FastLoad.
SESSIONS
This command specifies the number of FastLoad sessions to establish with Teradata. It is written in the script just before the logon. The default is 1 session per available AMP. The purpose of multiple sessions is to enhance throughput when loading large volumes of data. Too few sessions will stifle throughput. Too many will preclude availability of system resources to other users. You will need to find the proper balance for your configuration.
SLEEP
Working in conjunction with TENACITY, the SLEEP command specifies the amount minutes to wait before retrying to logon and establish all sessions. This situation can occur if all of the loader slots are used or if the number of requested sessions are not available. The default is 6 minutes. For example, suppose that Teradata sessions are already maxed-out when your job is set to run. If TENACITY were set at 4 and SLEEP at 10, then FastLoad would attempt to logon every 10 minutes for up to 4 hours. If there were no success by that time, all efforts to logon would cease.
TENACITY
Sometimes there are too many sessions already established with Teradata for a FastLoad to obtain the number of sessions it requested to perform its task or all of the loader slots are currently used. TENACITY specifies the amount of time, in hours, to retry to obtain a loader slot or to establish all requested sessions to logon. The default for FastLoad is “no tenacity”, meaning that it will not retry at all. If several FastLoad jobs are executed at the same time, we recommend setting the TENACITY to 4, meaning that the system will continue trying to logon for the number of sessions requested for up to four hours.