Welcome to the Teradata Embedded BTEQ Lab. The main purpose of this lab is to learn about Shell scripting, embedded code, and how to handle these situations with SnowConvert.

Prerequisites

What we will cover

This lab was created because we can get cases with embedded BTEQ and MLOAD code in the shell files. There are commands that allow users to embed other scripting languages into a BASH script, in this case, BTEQ code.

For example something like this:

echo 'unix command'
echo 'unix command'
bteq << EOF
.logon <systemid>/<userid>,<password>;
select current_timestamp;
.logoff;
.quit;
EOF
echo 'unix command 3'
echo 'unix command 4'
echo 'unix command 5'

The main purpose for embedding code is if the primary programming language cannot perform the task as efficiently as the secondary (embedded) language. But, it has some disadvantages, it requires the programmers and editors to understand more than one programming language and also, it requires more interpreters and we can be risking the context of our code.

It is like when you have a conversation, maybe if you're talking with a friend in English and then you just switch to German from one moment to another, unless your friend also speaks German too, she might have some difficulties understanding you. Another example is if you have some text in Chinese that maybe quotes some text into Spanish. When you use a Chinese translator, it will expect that all sentences are in Chinese. When it encounters the Spanish sentences the translator will not know what to do.

Handling with SnowConvert for Teradata

For Teradata migrations, SnowConvert understands the Teradata SQL dialect as well as the BTEQ dialect. Also when it is converting BTEQ or MLOAD it can detect and switch between languages. However SnowConvert does not understand the shell scripting language, so it cannot determine which file sections are written in shell and which sections in SQL or BTEQ. SnowConvert applies several recovery mechanisms but the quality of the results will not be optimal.

Our Data Engineer group has found that even though it is very common to find these code mixes. Sometimes your source files can have an extension like SQL or BTEQ but if you look inside of them their content is actually some kind of shell script. One way to find about this is to look for a SHEBANG that indicates that this file is using some kind of shell script.

Does SnowConvert handle these situations at all?

A shell script is a computer program designed to be run by the Unix shell, a command-line interpreter. Typical operations performed by shell scripts include file manipulation, program execution, and printing text.

I.e., a Shell Script is a text file containing commands which could have been typed directly into the shell. There is no difference in syntax between interactive command line use and placing the commands in a file. Some commands are only useful when used interactively (e.g. command line history recall) and other commands are too complex to use interactively.

Any shell can be used for writing a shell script. To allow for this, the first line of every script is:

#!/path/to/shell 
#!/bin/ksh

The #! characters (also known as she-ban ) tell the system to locate the following pathname, start it up and feed it the rest of the file as input. Any program which can read commands from a file can be started up this way, as long as it recognizes the # comment convention. The program is started, and then the script file is given to it as an argument. Because of this, the script must be readable as well as executable.

Typical Unix/Linux Shells

sh

aka "Bourne" shell, small, simple, and (originally) very few internal commands, so it called external programs for even the simplest of tasks. It is always available on everything that looks vaguely like Unix.

csh

The "C" shell has many things in common with the Bourne shell, but many enhancements to improve interactive use. The internal commands used only in scripts are very different from "sh", and similar (by design) to the "C" language syntax.

tcsh

The "TC" shell. Freely available and based on "csh". It has many additional features to make interactive use more convenient.

ksh

The "Korn" shell was written as a major upgrade to "sh" and backward compatible with it, but has many internal commands for the most frequently used functions. It also incorporates many of the features from tcsh which enhance interactive use (command line history recall etc.).

bash

The "Bourne again" shell. Written as part of the GNU/Linux Open Source effort, and the default shell for Linux and Mac OS-X. It is a functional clone of sh, with additional features to enhance interactive use, add POSIX compliance, and partial ksh compatibility.

This particular lab concentrates on the Bourne and the Korn shells.

Environment variables

Environment variables are commonly used within the shell. It is also a common means of configuring services and handling application secrets.

reading and writing variables

To set an environment variable the export command is used. We give the variable a name, which is what is used to access it in shell scripts and configurations and then a value to hold whatever data is needed in the variable.

export MY_VALUE=123

To output the value of the environment variable from the shell, we use the echo command and prepend the variable's name with a dollar ($) sign.

echo $MY_VALUE

To unset an environment variable, which removes its existence altogether, we use the unset command. Simply replace the environment variable with an empty string will not remove it, and in most cases will likely cause problems with scripts or applications expecting a valid value.

To the following syntax is used to unset an environment variable

unset VARIABLE_NAME

So, you can access those variables into your shell script, as we saw before outputting the value of the environment variable from the shell, we just need to reference the variable prepending the variable's name with a dollar ($) sign into your Shell script file.

ENV_VARIABLE=$MY_VALUE

Control Structures

There are two different kinds of control structures: loops and conditions. The Shell has three loop control structures: while, for, and for-in. There are two condition structures: if and case. As with any other known programming language, you can compare integers, compare strings, and even perform logical operations. This table lists some of the commonly used options and operators used.

Integer Comparisons

Function

-gt

Greater-than

-lt

Less-than

-ge

Greater-than-or-equal-to

-le

Less-than-or-equal-to

-eq

Equal

-ne

Not-equal

String Comparisons

-z

Tests for empty string

=

Equal strings

!=

Not-equal strings

Logical Operations

-a

Logical AND

-o

Logical OR

!

Logical NOT

File Tests

-f

File exists and is a regular file

-s

File is not empty

-r

File is readable

-w

File can be written to, modified

-x

File is executable

-d

Filename is a directory name

The Shell has a set of conditional control structures that allow you to choose what commands to execute. Many of these are similar to conditional control structures found in programming languages, but there are some differences.

Condition Control Structures: if, else, elif, case

Function

if command then command fi

if executes an action if its test command is true.

if command then command else command fi

if-else executes an action if the exit status of its test command is true; if false, then the else action is executed.

if command then command elif command then command else command fi

elif allows you to nest if structures, enabling selection among several alternatives; at the first true if structure, its commands are executed and control leaves the entire elif structure.

case string in *pattern*) *command*;; esac

case matches the string value to any of several patterns; if a pattern is matched, its associated commands are executed.

Loop Control Structures: while, until, for, for-in, select

while command do command done

while executes an action as long as its test command is true.

until command do command done

until executes an action as long as its test command is false.

for variable in list-values do command done

for-in is designed for use with lists of values; the variable operand is consecutively assigned the values in the list.

for variable do command done

for is designed for reference script arguments; the variable operand is consecutively assigned each argument value.

In this lab, we are focus on handling embedded BTEQ and MLOAD code inside shell scripts like .ksh or .sh. It is very common to encounter scenarios where you have embedded BTEQ code and MLOAD code inside your shell scripts.

Embedded BTEQ Code and MLOAD Code inside shell scripts like .ksh or .sh

BTEQ (Basic TEradata Query) is the utility used in Teradata and It helps to execute the queries either batch or interactive mode and is also used for import and export the data.

MLOAD (MultiLoad) as well can perform different types of tasks such as INSERT, DELETE, UPDATE and UPSERT. It can load up to 5 tables at a time and perform up to 20 DML operations in a script.

In shell script files, you can see the BTEQ and MLOAD code embedded when the command bteq , $BTEQ_COMMAND or mload is invoked in the script like this

$BTEQ_COMMAND << EOB
...
EOB
# or 
bteq << EOF
...
EOF
# or
mload <<!
...
!

Generally, the script uses the << EOF-EOF block to wrap the contents enclosed within this block in the terminal. The flag word EOF indicates the End Of File. However, any word will be accepted as long as it is unique in the script file. What is generally done is using EOF, EOB, EOC, etc. Particularly with mload, you will have realized that the flag used is !

alt-text-here

To solve the situation with the embedded code, we have some Python helpers scripts who will accompany you throughout the process, these files will extract the embedded code from the shell script files in a different file leaving some flags to fusion again with the converted code generated by SnowConvert for Teradata.

alt-text-here

These helpers cover some of the patterns we saw before ($BTEQ_COMMAND <<,bteq <<,mload <<), and the code of these helpers is open to adding other functionalities or patterns you have in any other script with embedded code. You can contact us if you have some feedback with these helpers, it will help us to improve!

After that, we will do an assessment using Mobilize.Net RapidScan to evaluate the previous code and the resulting code to see what was the impact of the process we do to handle this situations of embedded code in shell scripts, getting more clearness about the process.

alt-text-here

Overview

To set up the extract scripts, we need to do four different processes:

The first step will prepare the environment to do this lab, cloning the python helpers scripts used along with the lab and making some folders we will use, then we will use some of the helpers scripts to extract the embedded bteq and mload code, putting it to another separated file with its correspondent extension, letting now SnowConvert to process this resulting files and convert them, and finally, the restoring process will fusion the python files generated by SnowConvert with the shell scripts without the bteq or mload code.

For this lab, we are using three different shell scripts, script 1 is a ksh script with embedded bteq code using $BTEQ_COMMAND as flag that just make an insert of given values in the table AdventureWorksDW.DimCurrency with environment variables, script 2 is a sh script with embedded bteq code that extracts a query into a text file, and the last, script 3, is a sh file with embedded mload code that also insert given values in the table AdventureWorksDW.DimCurrency using environment variables

Step 1: Download scripts

In this scenario, you can use these python helpers scripts. You can clone the repository and then place on it in the terminal

git clone https://github.com/MobilizeNet/TeradataExportScripts
cd TeradataExportScripts

Then we need to create two different folders: one for the input files we are going to work with, and one for the output files.

mkdir inputFiles
mkdir outputFiles

Then, we need to copy the embedded bteq and mload shell files Scrip1.ksh, Script2.sh and Script3.sh located here and copy them to the new inputFiles directory.

Step 2: Download Mobilize.Net RapidScan

You can download Mobilize.Net RapidScan here to do the processing of your files with mixed languages. Complete the form and click on the Download Now button to start the download process.

alt-text-here

Step 3: Begin the Assessment

Step 3.1 - Open Mobilize.Net RapidScan and browse the inputFiles directory we created before and where the embedded bteq files are located, and then, click on Start Scan.

alt-text-here

Step 3.2 - You can click on Show output folder. In this folder, you can find a RapidScan-inputFiles-.zip, if you open it, you can find a folder called GenericScannerOutput, and then, inside of this folder, you can find a CSV file called FilesInventory. This file shows you the Content Lines of each Shell file. Remember this Content Lines amounts, we will use it before.

alt-text-here

alt-text-here

Get some measure of the files.

Step 3.3 - You can click on Show output folder. In this folder, you can find a RapidScan-inputFiles-.zip, if you open it, you can find a folder called GenericScannerOutput, and then, inside of this folder, you can find a CSV file called FilesInventory. This file shows you the Content Lines of each Shell file.

Step 4: Extracting snippets from code

Placed on the TeradataExportScripts directory, we are going to use extract_snippets.py to extract our sample files.

To access the help notes of the extract_snippets.py file, you can run

python extract_snippets.py -h

To extract our sample files you can run

python extract_snippets.py --inputdir inputFiles --outdir outputFiles

This scripts will generate several files, for each shell script in inputFiles directory, it will generate an *.pre.sh file and a *.snippet.1.bteq file for embedded bteq code files, and *.mload.pre.sh file and *.mload.snippet.1.mload file for embedded mload code files

These scripts will generate several files for each shell script in the inputFiles directory. The files that it will generate are:

In the code below you will be able to see the original code and its corresponding convertion.

#### File:Script3.ksh
STAGE_DB_NAME=${ENVDB}_STG
mload <<!
$LOGON;
INSERT INTO ${STAGE_DB_NAME}."DimCurrency"
("CurrencyKey", "CurrencyAlternateKey", "CurrencyName")
VALUES
($CURRENCY_KEY, $CURRENCY_ALTERNATE_KEY, $CURRENCY_NAME);
.END MLOAD;
.LOGOFF;
QUIT;
!
#### Script3.ksh.pre.ksh

STAGE_DB_NAME=${ENVDB}_STG
@@SNIPPET1_MultiLoad.py

Notice how the MLOAD code was changed to the @@SNIPPET Syntax within the script.

#### Script3.sh.snippet.1.mload
;/*Not supported command from variable $LOGON;*/
INSERT INTO ${STAGE_DB_NAME}."DimCurrency"
("CurrencyKey", "CurrencyAlternateKey", "CurrencyName")
VALUES
($CURRENCY_KEY, $CURRENCY_ALTERNATE_KEY, $CURRENCY_NAME);
.END MLOAD;
.LOGOFF;
QUIT;

Step 3: Converting

Now, you can then feed the migration tool with those bteq and mload files. In SnowConvert, as Input directory, we are putting the outputFiles directory where the bteq and mload files are located, and as Output directory, you can let the default Output directory outputFiles_out.

alt-text-here

Step 3.1: Pasting Output

As we can see, the conversion is complete, now, we need to copy the generated *.py files from the output directory (outputFiles_out\Output) and then, paste them on

outputFiles

directory

Step 4: Restoring

After migration using SnowConvert for teradata, also placed on TeradataExportScripts directory, we are going to work with the output of SnowConvert for Teradata just run

python restore_snippets.py --inputdir outputFiles

And it will rebuild your original file replacing your bteq fragments by

STAGE_DB_NAME=${ENVDB}_STG
result=$(python <<END_SNOWSCRIPT
#*** Generated code is based on the SnowConvert Python Helpers version 2.0.6 ***

import os
import sys
import snowconvert.helpers
from snowconvert.helpers import Export
from snowconvert.helpers import exec
con = None
def main():
snowconvert.helpers.configure_log()
con = snowconvert.helpers.log_on()
exec("""
INSERT INTO ${STAGE_DB_NAME}.PUBLIC."DimCurrency" ("CurrencyKey", "CurrencyAlternateKey", "CurrencyName") VALUES ($CURRENCY_KEY, $CURRENCY_ALTERNATE_KEY, $CURRENCY_NAME)
""")
 if con is not None:
con.close()
con = None
#** MSC-WARNING - MSCEWI1002 - REMOVED NEXT STATEMENT, NOT APPLICABLE IN SNOWFLAKE.
#QUIT **

snowconvert.helpers.quit_application()
if __name__ == "__main__":
main()
END_SNOWSCRIPT
)

Compare assessment results after conversion

Apply the splitting.

After extracting sh and ksh files, you realized you generated several files. For each shell script in inputFiles directory, it generated a *.pre.sh file and a *.snippet.1.bteq file for embedded bteq code files, and *.mload.pre.sh file and *.mload.snippet.1.mload file for embedded mload code files.

Run Rapid Scan again to determine how much code was migrated and how it ended up being

Step 1 - Open again Mobilize.Net RapidScan and browse the outputFiles directory we created before and where the extracted files are located, and then, click on Start Scan.

Step 2 - Open again the FilesInventory.csv file as we did in Step 3 (As we recommended before, you can also use Microsoft Excel. Here are both files, so you can easily compare what happened during the process we ran.

Original Code Rapid Scan Inventory Original Rapid Scan Inventory

Converted Code Rapid Scan Inventory Converted Code Rapid Scan Inventory

In here you can analyze several things:

  1. A lot of new files were created! You can easily see that the number of files increased from 3 to 14. This is because the code files were split into different files (such as the .pre, .py, .bteq or .mload files). There are also some .csv files referring to the results of the split.
  2. The files highlighted in green are the final results. As you can see, their ContentLines values (code lines) increased in comparison to the original ones. This is because the .py was included into the new code.
  3. If you add the values in ContentLines from the .pre files (yellow) with the .bteq or .mload ContentLines values (blue), you'll get the same number as the original file minus one. This is because these kind of snippets require 2 lines (one for the beginning of the block and one for the ending of the block), while in the .pre files, the extract_snippets.py script replaces everything with just one line of @@SNIPPET# syntax.

Teradata Export Scripts

UNIX shell scripting with ksh/bash

Embedding Other Languages in BASH Scripts

Control Structures

SnowConvert Official Documentation

SnowConvert for Teradata

Mobilize.Net RapidScan