utilities

Download and management utilities for syncing time and auxiliary files

  • Can list a directory on a ftp host

  • Can download a file from a ftp or http host

  • Can download a file from CDDIS via https when NASA Earthdata credentials are supplied

  • Checks MD5 or sha1 hashes between local and remote files

Source code

General Methods

SMBcorr.utilities.get_data_path(relpath)[source]

Get the absolute path within a package from a relative path

Parameters:
relpath: str,

relative path

SMBcorr.utilities.import_dependency(name: str, extra: str = '', raise_exception: bool = False)[source]

Import an optional dependency

Adapted from pandas.compat._optional::import_optional_dependency

Parameters:
name: str

Module name

extra: str, default “”

Additional text to include in the ImportError message

raise_exception: bool, default False

Raise an ImportError if the module is not found

Returns:
module: obj

Imported module

SMBcorr.utilities.get_hash(local, algorithm='MD5')[source]

Get the hash value from a local file or BytesIO object

Parameters:
local: obj or str

BytesIO object or path to file

algorithm: str, default ‘MD5’

hashing algorithm for checksum validation

  • 'MD5': Message Digest

  • 'sha1': Secure Hash Algorithm

SMBcorr.utilities.url_split(s)[source]

Recursively split a url path into a list

Parameters:
s: str

url string

SMBcorr.utilities.get_unix_time(time_string, format='%Y-%m-%d %H:%M:%S')[source]

Get the Unix timestamp value for a formatted date string

Parameters:
time_string: str

formatted time string to parse

format: str, default ‘%Y-%m-%d %H:%M:%S’

format for input time string

SMBcorr.utilities.isoformat(time_string)[source]

Reformat a date string to ISO formatting

Parameters:
time_string: str

formatted time string to parse

SMBcorr.utilities.even(value)[source]

Rounds a number to an even number less than or equal to original

Parameters:
value: float

number to be rounded

SMBcorr.utilities.ceil(value)[source]

Rounds a number upward to its nearest integer

Parameters:
value: float

number to be rounded upward

SMBcorr.utilities.copy(source, destination, move=False, **kwargs)[source]

Copy or move a file with all system information

Parameters:
source: str

source file

destination: str

copied destination file

move: bool, default False

remove the source file

SMBcorr.utilities.check_ftp_connection(HOST, username=None, password=None)[source]

Check internet connection with ftp host

Parameters:
HOST: str

remote ftp host

username: str or NoneType

ftp username

password: str or NoneType

ftp password

SMBcorr.utilities.ftp_list(HOST, username=None, password=None, timeout=None, basename=False, pattern=None, sort=False)[source]

List a directory on a ftp host

Parameters:
HOST: str or list

remote ftp host path split as list

username: str or NoneType

ftp username

password: str or NoneType

ftp password

timeout: int or NoneType, default None

timeout in seconds for blocking operations

basename: bool, default False

return the file or directory basename instead of the full path

pattern: str or NoneType, default None

regular expression pattern for reducing list

sort: bool, default False

sort output list

Returns:
output: list

items in a directory

mtimes: list

last modification times for items in the directory

SMBcorr.utilities.from_ftp(HOST, username=None, password=None, timeout=None, local=None, hash='', chunk=8192, verbose=False, fid=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, mode=509)[source]

Download a file from a ftp host

Parameters:
HOST: str or list

remote ftp host path

username: str or NoneType

ftp username

password: str or NoneType

ftp password

timeout: int or NoneType, default None

timeout in seconds for blocking operations

local: str or NoneType, default None

path to local file

hash: str, default ‘’

MD5 hash of local file

chunk: int, default 8192

chunk size for transfer encoding

verbose: bool, default False

print file transfer information

fid: obj, default sys.stdout

open file object to print if verbose

mode: oct, default 0o775

permissions mode of output local file

Returns:
remote_buffer: obj

BytesIO representation of file

SMBcorr.utilities.http_list(HOST, timeout=None, context=<ssl.SSLContext object>, parser=<lxml.etree.HTMLParser object>, format='%Y-%m-%d %H:%M', pattern='', sort=False)[source]

List a directory on an Apache http Server

Parameters:
HOST: str or list

remote http host path

timeout: int or NoneType, default None

timeout in seconds for blocking operations

context: obj, default ssl.SSLContext(ssl.PROTOCOL_TLS)

SSL context for urllib opener object

parser: obj, default lxml.etree.HTMLParser()

HTML parser for lxml

format: str, default ‘%Y-%m-%d %H:%M’

format for input time string

pattern: str, default ‘’

regular expression pattern for reducing list

sort: bool, default False

sort output list

Returns:
colnames: list

column names in a directory

collastmod: list

last modification times for items in the directory

SMBcorr.utilities.from_http(HOST, timeout=None, context=<ssl.SSLContext object>, local=None, hash='', chunk=16384, verbose=False, fid=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, mode=509)[source]

Download a file from a http host

Parameters:
HOST: str or list

remote http host path split as list

timeout: int or NoneType, default None

timeout in seconds for blocking operations

context: obj, default ssl.SSLContext(ssl.PROTOCOL_TLS)

SSL context for urllib opener object

timeout: int or NoneType, default None

timeout in seconds for blocking operations

local: str or NoneType, default None

path to local file

hash: str, default ‘’

MD5 hash of local file

chunk: int, default 16384

chunk size for transfer encoding

verbose: bool, default False

print file transfer information

fid: obj, default sys.stdout

open file object to print if verbose

mode: oct, default 0o775

permissions mode of output local file

Returns:
remote_buffer: obj

BytesIO representation of file

SMBcorr.utilities.build_opener(username, password, context=<ssl.SSLContext object>, password_manager=False, get_ca_certs=False, redirect=False, authorization_header=True, urs='https://urs.earthdata.nasa.gov')[source]

Build urllib opener for NASA Earthdata with supplied credentials

Parameters:
username: str or NoneType, default None

NASA Earthdata username

password: str or NoneType, default None

NASA Earthdata password

context: obj, default ssl.SSLContext(ssl.PROTOCOL_TLS)

SSL context for urllib opener object

password_manager: bool, default False

Create password manager context using default realm

get_ca_certs: bool, default False

Get list of loaded “certification authority” certificates

redirect: bool, default False

Create redirect handler object

authorization_header: bool, default True

Add base64 encoded authorization header to opener

urs: str, default ‘https://urs.earthdata.nasa.gov’

Earthdata login URS 3 host

SMBcorr.utilities.gesdisc_list(HOST, username=None, password=None, build=False, timeout=None, urs='urs.earthdata.nasa.gov', parser=<lxml.etree.HTMLParser object>, format='%Y-%m-%d %H:%M', pattern='', sort=False)[source]

List a directory on NASA GES DISC servers

Parameters:
HOST: str or list

remote https host

username: str or NoneType, default None

NASA Earthdata username

password: str or NoneType, default None

NASA Earthdata password

build: bool, default True

Build opener with NASA Earthdata credentials

timeout: int or NoneType, default None

timeout in seconds for blocking operations

urs: str, default ‘urs.earthdata.nasa.gov’

Earthdata login URS 3 host

parser: obj, default lxml.etree.HTMLParser()

HTML parser for lxml

format: str, default ‘%Y-%m-%d %H:%M’

format for input time string

pattern: str, default ‘’

regular expression pattern for reducing list

sort: bool, default False

sort output list

Returns:
colnames: list

column names in a directory

collastmod: list

last modification times for items in the directory

SMBcorr.utilities.cmr_filter_json(search_results, endpoint='data', request_type='application/x-netcdf')[source]

Filter the CMR json response for desired data files

Parameters:
search_results: dict

json response from CMR query

endpoint: str, default ‘data’

url endpoint type

  • 'data': NASA Earthdata https archive

  • 'opendap': NASA Earthdata OPeNDAP archive

  • 's3': NASA Earthdata Cumulus AWS S3 bucket

request_type: str, default ‘application/x-netcdf’

data type for reducing CMR query

Returns:
granule_names: list

Model granule names

granule_urls: list

Model granule urls

granule_mtimes: list

Model granule modification times

SMBcorr.utilities.cmr(short_name, version=None, start_date=None, end_date=None, provider='GES_DISC', endpoint='data', request_type='application/x-netcdf', verbose=False, fid=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]

Query the NASA Common Metadata Repository (CMR) for model data

Parameters:
short_name: str

Model shortname in the CMR system

version: str or NoneType, default None

Model version

start_date: str or NoneType, default None

starting date for CMR product query

end_date: str or NoneType, default None

ending date for CMR product query

provider: str, default ‘GES_DISC’

CMR data provider

  • 'GES_DISC': GESDISC

  • 'GESDISCCLD': GESDISC Cumulus

  • 'PODAAC': PO.DAAC Drive

  • 'POCLOUD': PO.DAAC Cumulus

endpoint: str, default ‘data’

url endpoint type

  • 'data': NASA Earthdata https archive

  • 'opendap': NASA Earthdata OPeNDAP archive

  • 's3': NASA Earthdata Cumulus AWS S3 bucket

request_type: str, default ‘application/x-netcdf’

data type for reducing CMR query

verbose: bool, default False

print CMR query information

fid: obj, default sys.stdout

open file object to print if verbose

Returns:
granule_names: list

Model granule names

granule_urls: list

Model granule urls

granule_mtimes: list

Model granule modification times

SMBcorr.utilities.build_request(short_name, dataset_version, url, variables=[], format='bmM0Lw', service='L34RS_MERRA2', version='1.02', bbox=[-90, -180, 90, 180], **kwargs)[source]

Build requests for the GES DISC subsetting API

Parameters:
short_name: str

Model shortname in the CMR system

url: str

url for granule returned by the CMR system

variables: list, default []

Variables for product to subset

format: str, default ‘bmM0Lw’

Coded output format for GES DISC subsetting API

service: str, default ‘L34RS_MERRA2’

GES DISC subsetting API service

version: str, default ‘1.02’

GES DISC subsetting API service version

bbox: list, default [-90,-180,90,180]

Bounding box to spatially subset

**kwargs: dict, default {}

Additional parameters for GES DISC subsetting API

Returns:
request_url: str

Formatted url for GES DISC subsetting API