Welcome to the Teradata Embedded BTEQ Lab. The main purpose of this lab is to learn about Shell scripting, embedded code, and how to handle these situations with SnowConvert.
This lab was created because we can get cases with embedded BTEQ
and MLOAD
code in the shell files. There are commands that allow users to embed other scripting languages into a BASH script, in this case, BTEQ code.
For example something like this:
echo 'unix command'
echo 'unix command'
bteq << EOF
.logon <systemid>/<userid>,<password>;
select current_timestamp;
.logoff;
.quit;
EOF
echo 'unix command 3'
echo 'unix command 4'
echo 'unix command 5'
The main purpose for embedding code is if the primary programming language cannot perform the task as efficiently as the secondary (embedded) language. But, it has some disadvantages, it requires the programmers and editors to understand more than one programming language and also, it requires more interpreters and we can be risking the context of our code.
It is like when you have a conversation, maybe if you're talking with a friend in English and then you just switch to German from one moment to another, unless your friend also speaks German too, she might have some difficulties understanding you. Another example is if you have some text in Chinese that maybe quotes some text into Spanish. When you use a Chinese translator, it will expect that all sentences are in Chinese. When it encounters the Spanish sentences the translator will not know what to do.
For Teradata migrations, SnowConvert understands the Teradata SQL dialect as well as the BTEQ dialect. Also when it is converting BTEQ or MLOAD it can detect and switch between languages. However SnowConvert does not understand the shell scripting language, so it cannot determine which file sections are written in shell and which sections in SQL or BTEQ. SnowConvert applies several recovery mechanisms but the quality of the results will not be optimal.
Our Data Engineer group has found that even though it is very common to find these code mixes. Sometimes your source files can have an extension like SQL or BTEQ but if you look inside of them their content is actually some kind of shell script. One way to find about this is to look for a SHEBANG that indicates that this file is using some kind of shell script.
Does SnowConvert handle these situations at all?
A shell script is a computer program designed to be run by the Unix shell, a command-line interpreter. Typical operations performed by shell scripts include file manipulation, program execution, and printing text.
I.e., a Shell Script is a text file containing commands which could have been typed directly into the shell. There is no difference in syntax between interactive command line use and placing the commands in a file. Some commands are only useful when used interactively (e.g. command line history recall) and other commands are too complex to use interactively.
Any shell can be used for writing a shell script. To allow for this, the first line of every script is:
#!/path/to/shell
#!/bin/ksh
The #!
characters (also known as she-ban ) tell the system to locate the following pathname, start it up and feed it the rest of the file as input. Any program which can read commands from a file can be started up this way, as long as it recognizes the #
comment convention. The program is started, and then the script file is given to it as an argument. Because of this, the script must be readable as well as executable.
aka "Bourne" shell, small, simple, and (originally) very few internal commands, so it called external programs for even the simplest of tasks. It is always available on everything that looks vaguely like Unix.
The "C" shell has many things in common with the Bourne shell, but many enhancements to improve interactive use. The internal commands used only in scripts are very different from "sh", and similar (by design) to the "C" language syntax.
The "TC" shell. Freely available and based on "csh". It has many additional features to make interactive use more convenient.
The "Korn" shell was written as a major upgrade to "sh" and backward compatible with it, but has many internal commands for the most frequently used functions. It also incorporates many of the features from tcsh which enhance interactive use (command line history recall etc.).
The "Bourne again" shell. Written as part of the GNU/Linux Open Source effort, and the default shell for Linux and Mac OS-X. It is a functional clone of sh, with additional features to enhance interactive use, add POSIX compliance, and partial ksh compatibility.
This particular lab concentrates on the Bourne and the Korn shells.
Environment variables are commonly used within the shell. It is also a common means of configuring services and handling application secrets.
To set an environment variable the export command is used. We give the variable a name, which is what is used to access it in shell scripts and configurations and then a value to hold whatever data is needed in the variable.
export MY_VALUE=123
To output the value of the environment variable from the shell, we use the echo command and prepend the variable's name with a dollar ($) sign.
echo $MY_VALUE
To unset an environment variable, which removes its existence altogether, we use the unset command. Simply replace the environment variable with an empty string will not remove it, and in most cases will likely cause problems with scripts or applications expecting a valid value.
To the following syntax is used to unset an environment variable
unset VARIABLE_NAME
So, you can access those variables into your shell script, as we saw before outputting the value of the environment variable from the shell, we just need to reference the variable prepending the variable's name with a dollar ($) sign into your Shell script file.
ENV_VARIABLE=$MY_VALUE
There are two different kinds of control structures: loops and conditions. The Shell has three loop control structures: while, for, and for-in. There are two condition structures: if and case. As with any other known programming language, you can compare integers, compare strings, and even perform logical operations. This table lists some of the commonly used options and operators used.
Integer Comparisons | Function | |
-gt | Greater-than | |
-lt | Less-than | |
-ge | Greater-than-or-equal-to | |
-le | Less-than-or-equal-to | |
-eq | Equal | |
-ne | Not-equal | |
String Comparisons | ||
-z | Tests for empty string | |
= | Equal strings | |
!= | Not-equal strings | |
Logical Operations | ||
-a | Logical AND | |
-o | Logical OR | |
! | Logical NOT | |
File Tests | ||
-f | File exists and is a regular file | |
-s | File is not empty | |
-r | File is readable | |
-w | File can be written to, modified | |
-x | File is executable | |
-d | Filename is a directory name |
The Shell has a set of conditional control structures that allow you to choose what commands to execute. Many of these are similar to conditional control structures found in programming languages, but there are some differences.
Condition Control Structures: if, else, elif, case | Function | |
if command then command fi | if executes an action if its test command is true. | |
if command then command else command fi | if-else executes an action if the exit status of its test command is true; if false, then the else action is executed. | |
if command then command elif command then command else command fi | elif allows you to nest if structures, enabling selection among several alternatives; at the first true if structure, its commands are executed and control leaves the entire elif structure. | |
case string in *pattern*) *command*;; esac | case matches the string value to any of several patterns; if a pattern is matched, its associated commands are executed. | |
Loop Control Structures: while, until, for, for-in, select | ||
while command do command done | while executes an action as long as its test command is true. | |
until command do command done | until executes an action as long as its test command is false. | |
for variable in list-values do command done | for-in is designed for use with lists of values; the variable operand is consecutively assigned the values in the list. | |
for variable do command done | for is designed for reference script arguments; the variable operand is consecutively assigned each argument value. |
In this lab, we are focus on handling embedded BTEQ and MLOAD code inside shell scripts like .ksh or .sh. It is very common to encounter scenarios where you have embedded BTEQ code and MLOAD code inside your shell scripts.
.ksh
or .sh
BTEQ (Basic TEradata Query) is the utility used in Teradata and It helps to execute the queries either batch or interactive mode and is also used for import and export the data.
MLOAD (MultiLoad) as well can perform different types of tasks such as INSERT, DELETE, UPDATE and UPSERT. It can load up to 5 tables at a time and perform up to 20 DML operations in a script.
In shell script files, you can see the BTEQ and MLOAD code embedded when the command bteq
, $BTEQ_COMMAND
or mload
is invoked in the script like this
$BTEQ_COMMAND << EOB
...
EOB
# or
bteq << EOF
...
EOF
# or
mload <<!
...
!
Generally, the script uses the << EOF-EOF
block to wrap the contents enclosed within this block in the terminal. The flag word EOF indicates the End Of File. However, any word will be accepted as long as it is unique in the script file. What is generally done is using EOF, EOB, EOC, etc. Particularly with mload, you will have realized that the flag used is !
To solve the situation with the embedded code, we have some Python
helpers scripts who will accompany you throughout the process, these files will extract the embedded code from the shell script files in a different file leaving some flags to fusion again with the converted code generated by SnowConvert for Teradata.
These helpers cover some of the patterns we saw before ($BTEQ_COMMAND <<
,bteq <<
,mload <<
), and the code of these helpers is open to adding other functionalities or patterns you have in any other script with embedded code. You can contact us if you have some feedback with these helpers, it will help us to improve!
After that, we will do an assessment using Mobilize.Net RapidScan to evaluate the previous code and the resulting code to see what was the impact of the process we do to handle this situations of embedded code in shell scripts, getting more clearness about the process.
To set up the extract scripts, we need to do four different processes:
The first step will prepare the environment to do this lab, cloning the python helpers scripts used along with the lab and making some folders we will use, then we will use some of the helpers scripts to extract the embedded bteq and mload code, putting it to another separated file with its correspondent extension, letting now SnowConvert to process this resulting files and convert them, and finally, the restoring process will fusion the python files generated by SnowConvert with the shell scripts without the bteq or mload code.
For this lab, we are using three different shell scripts, script 1 is a ksh
script with embedded bteq
code using $BTEQ_COMMAND
as flag that just make an insert of given values in the table AdventureWorksDW.DimCurrency with environment variables, script 2 is a sh
script with embedded bteq
code that extracts a query into a text file, and the last, script 3, is a sh
file with embedded mload
code that also insert given values in the table AdventureWorksDW.DimCurrency using environment variables
In this scenario, you can use these python helpers scripts. You can clone the repository and then place on it in the terminal
git clone https://github.com/MobilizeNet/TeradataExportScripts
cd TeradataExportScripts
Then we need to create two different folders: one for the input files we are going to work with, and one for the output files.
mkdir inputFiles
mkdir outputFiles
Then, we need to copy the embedded bteq and mload shell files Scrip1.ksh
, Script2.sh
and Script3.sh
located here and copy them to the new inputFiles
directory.
You can download Mobilize.Net RapidScan here to do the processing of your files with mixed languages. Complete the form and click on the Download Now button to start the download process.
Step 3.1 - Open Mobilize.Net RapidScan and browse the inputFiles
directory we created before and where the embedded bteq files are located, and then, click on Start Scan
.
Step 3.2 - You can click on Show output folder
. In this folder, you can find a RapidScan-inputFiles-
, if you open it, you can find a folder called GenericScannerOutput, and then, inside of this folder, you can find a CSV
file called FilesInventory
. This file shows you the Content Lines of each Shell file. Remember this Content Lines amounts, we will use it before.
Step 3.3 - You can click on Show output folder
. In this folder, you can find a RapidScan-inputFiles-
, if you open it, you can find a folder called GenericScannerOutput, and then, inside of this folder, you can find a CSV
file called FilesInventory
. This file shows you the Content Lines of each Shell file.
Placed on the TeradataExportScripts
directory, we are going to use extract_snippets.py
to extract our sample files.
To access the help notes of the extract_snippets.py
file, you can run
python extract_snippets.py -h
To extract our sample files you can run
python extract_snippets.py --inputdir inputFiles --outdir outputFiles
This scripts will generate several files, for each shell script in inputFiles
directory, it will generate an *.pre.sh
file and a *.snippet.1.bteq
file for embedded bteq code files, and *.mload.pre.sh
file and *.mload.snippet.1.mload
file for embedded mload code files
These scripts will generate several files for each shell script in the inputFiles
directory. The files that it will generate are:
*.pre.sh
: These are the original sh files, with replacements for the snippets that are to be migrated. These replacements are identified with a @@SNIPPET
syntax.*.snippet.X.bteq/mload
: These files would be the bteq or mload code snippets that will be converted to Python code. The X represents the consecutive number of the snippet within the file.In the code below you will be able to see the original code and its corresponding convertion.
#### File:Script3.ksh
STAGE_DB_NAME=${ENVDB}_STG
mload <<!
$LOGON;
INSERT INTO ${STAGE_DB_NAME}."DimCurrency"
("CurrencyKey", "CurrencyAlternateKey", "CurrencyName")
VALUES
($CURRENCY_KEY, $CURRENCY_ALTERNATE_KEY, $CURRENCY_NAME);
.END MLOAD;
.LOGOFF;
QUIT;
!
#### Script3.ksh.pre.ksh
STAGE_DB_NAME=${ENVDB}_STG
@@SNIPPET1_MultiLoad.py
Notice how the MLOAD code was changed to the @@SNIPPET Syntax within the script.
#### Script3.sh.snippet.1.mload
;/*Not supported command from variable $LOGON;*/
INSERT INTO ${STAGE_DB_NAME}."DimCurrency"
("CurrencyKey", "CurrencyAlternateKey", "CurrencyName")
VALUES
($CURRENCY_KEY, $CURRENCY_ALTERNATE_KEY, $CURRENCY_NAME);
.END MLOAD;
.LOGOFF;
QUIT;
Now, you can then feed the migration tool with those bteq
and mload
files. In SnowConvert, as Input directory, we are putting the outputFiles
directory where the bteq and mload files are located, and as Output directory, you can let the default Output directory outputFiles_out
.
As we can see, the conversion is complete, now, we need to copy the generated *.py
files from the output directory (outputFiles_out\Output
) and then, paste them on
outputFiles
directory
After migration using SnowConvert for teradata, also placed on TeradataExportScripts
directory, we are going to work with the output of SnowConvert for Teradata just run
python restore_snippets.py --inputdir outputFiles
And it will rebuild your original file replacing your bteq fragments by
STAGE_DB_NAME=${ENVDB}_STG
result=$(python <<END_SNOWSCRIPT
#*** Generated code is based on the SnowConvert Python Helpers version 2.0.6 ***
import os
import sys
import snowconvert.helpers
from snowconvert.helpers import Export
from snowconvert.helpers import exec
con = None
def main():
snowconvert.helpers.configure_log()
con = snowconvert.helpers.log_on()
exec("""
INSERT INTO ${STAGE_DB_NAME}.PUBLIC."DimCurrency" ("CurrencyKey", "CurrencyAlternateKey", "CurrencyName") VALUES ($CURRENCY_KEY, $CURRENCY_ALTERNATE_KEY, $CURRENCY_NAME)
""")
if con is not None:
con.close()
con = None
#** MSC-WARNING - MSCEWI1002 - REMOVED NEXT STATEMENT, NOT APPLICABLE IN SNOWFLAKE.
#QUIT **
snowconvert.helpers.quit_application()
if __name__ == "__main__":
main()
END_SNOWSCRIPT
)
After extracting sh
and ksh
files, you realized you generated several files. For each shell script in inputFiles
directory, it generated a *.pre.sh
file and a *.snippet.1.bteq
file for embedded bteq
code files, and *.mload.pre.sh
file and *.mload.snippet.1.mload
file for embedded mload code files.
Step 1 - Open again Mobilize.Net RapidScan and browse the outputFiles
directory we created before and where the extracted files are located, and then, click on Start Scan
.
Step 2 - Open again the FilesInventory.csv
file as we did in Step 3 (As we recommended before, you can also use Microsoft Excel. Here are both files, so you can easily compare what happened during the process we ran.
Original Rapid Scan Inventory
Converted Code Rapid Scan Inventory
In here you can analyze several things:
.pre
, .py
, .bteq
or .mload
files). There are also some .csv
files referring to the results of the split.ContentLines
values (code lines) increased in comparison to the original ones. This is because the .py
was included into the new code.ContentLines
from the .pre
files (yellow) with the .bteq
or .mload
ContentLines
values (blue), you'll get the same number as the original file minus one. This is because these kind of snippets require 2 lines (one for the beginning of the block and one for the ending of the block), while in the .pre
files, the extract_snippets.py
script replaces everything with just one line of @@SNIPPET#
syntax.UNIX shell scripting with ksh/bash
Embedding Other Languages in BASH Scripts