Industry terms

Claire Samuel

B

Big Data
Business intelligence

C

COGNOS

D

Data governance
Data migration
Data scientist
DataStage
Data warehouse
DB2

E

ETL (Extract, Transform and Load)
Exadata

I

IBM
Informatica

K

Kalido
Kognitio

M

Master data management
Microstrategy
MPP
MS SQL server

N

Netezza

O

OBIEE
Oracle

P

ParAccel

Q

QlikTech

S

SAP Business Objects
SAP Data Integrator
SSAS
SSIS
SSRS
SPSS

T

Talend
Teradata

Big Data

Big Data is a term that has a number of possible meanings, depending on who is using the term. IN its purest form it can mean just that, large amounts of data, but these days it is more likely to be used to describe ‘New data’, including the likes of social media feeds, online gaming data and the likes. Look at Tesco. 11 years ago they bought the consultancy that crunched the customer data behind their famous Clubcard, and within a short while they were the UK’s largest retailer. This was no coincidence.

In today’s world of exploding data sources; from social networks to mobile internet to digital TV there is an ocean of user data. Increasingly this is being called Big Data. In an industry fond of buzz words this basically means that the data is complex rather than just Big. From video blogs to Twitter to pipeline sensor data and more this all needs to be picked apart, analysed and where appropriate acted upon.

But it can be big too. One social gaming company is receiving and processing four billion data events a day from its gaming community. This has led to a requirement for and the development of a new type of database, NoSQL or Hadoop.

We are seeing an explosion of these types of roles, and although this technology will not replace ‘traditional SQL type’ databases, it is certainly going to be an important weapon in the armoury of understanding Big Data .

A position that developed as a result of this new area within the field of information management is the data scientist.

▲ Top


Business intelligence

Business intelligence (BI) is a broad category of applications and technologies for gathering, storing, analysing and providing access to data to help enterprise users make better business decisions. BI applications include the activities of decision support systems, query and reporting, online analytical processing (OLAP), statistical analysis, forecasting, and data mining.

Business intelligence applications can be mission-critical and integral to an enterprise’s operations or occasional to meet a special requirement; enterprise-wide or local to one division, department, or project; centrally initiated or driven by user demand.

This term was used as early as September 1996, when a Gartner Group report said:

“By 2000, information democracy will emerge in forward-thinking enterprises, with business intelligence information and applications available broadly to employees, consultants, customers, suppliers and the public. The key to thriving in a competitive marketplace is staying ahead of the competition. Making sound business decisions based on accurate and current information takes more than intuition. Data analysis, reporting and query tools can help business users wade through a sea of data to synthesize valuable information from it – today these tools collectively fall into a category called business intelligence.”

▲ Top


COGNOS

Cognos, based in Ottawa, Canada, produce Business Intelligence software and were acquired by IBM in 2008.

The latest version, Cognos 10, lets you explore any data, in any combination and over any time period with a broad range of analytics capabilities. It includes query and reporting tools, analysis tools, dashboard tools and scorecarding tools.

▲ Top


Data governance

Data governance is an emerging discipline with an evolving definition. The discipline embodies a convergence of data quality, data management, data policies, business process management, and risk management surrounding the handling of data in an organisation. Through data governance, organisations are looking to exercise positive control over the processes and methods used by their data stewards and data custodians to handle data.

Data governance is a set of processes that ensures that important data assets are formally managed throughout the enterprise. Data governance ensures that data can be trusted and that people can be made accountable for any adverse event that happens because of low data quality. It is about putting people in charge of fixing and preventing issues with data so that the enterprise can become more efficient. Data governance also describes an evolutionary process for a company, altering the company’s way of thinking and setting up the processes to handle information so that it may be used by the entire organisation. It’s about using technology when necessary in many forms to help aid the process. When companies want, or are required, to gain control of their data, they empower their people, set up processes and get help from technology to do it.

There are some commonly cited vendor definitions for data governance. Data governance is a quality control discipline for assessing, managing, using, improving, monitoring, maintaining, and protecting organisational information. It is a system of decision rights and accountabilities for information-related processes, executed according to agreed-upon models which describe who can take what actions with what information, and when, under what circumstances, using what methods.

▲ Top


Data migration

Data migration is the process of transferring data between storage types, formats or computer systems. Data migration is usually performed programmatically to achieve an automated migration, freeing up human resources from tedious tasks. It is required when organisations or individuals change computer systems or upgrade to new systems, or when systems merge (such as when the organisations that use them undergo a merger or takeover).

To achieve an effective data migration procedure, data on the old system is mapped to the new system providing a design for data extraction and data loading. The design relates old data formats to the new system’s formats and requirements. Programmatic data migration may involve many phases but it minimally includes data extraction where data is read from the old system and data loading where data is written to the new system.

If a decision has been made to provide a set input file specification for loading data onto the target system, this allows a pre-load ‘data validation’ step to be put in place, interrupting the standard ETL process. Such a data validation process can be designed to interrogate the data to be transferred, to ensure that it meets the predefined criteria of the target environment and the input file specification. An alternative strategy is to have on-the-fly data validation occurring at the point of loading, which can be designed to report on load rejection errors as the load progresses. However, in the event that the extracted and transformed data elements are highly ‘integrated’ with one another, and the presence of all extracted data in the target system is essential to system functionality, this strategy can have detrimental, and not easily quantifiable effects.

After loading into the new system, results are subjected to data verification to determine whether data was accurately translated, is complete, and supports processes in the new system. During verification, there may be a need for a parallel run of both systems to identify areas of disparity and forestall erroneous data loss.

Automated and manual data cleaning is commonly performed in migration to improve data quality, eliminate redundant or obsolete information and match the requirements of the new system.

Data migration phases (design, extraction, cleansing, load, verification) for applications of moderate to high complexity are commonly repeated several times before the new system is deployed.

▲ Top


Data scientist

Data scientists are shaping up as crucial to companies looking to be leaders in the understanding of big data analytics. Organisations want to have a better understanding of who these people are, how they’re different, and how they get a them to work at their own organisation.

The role of data scientist touches on more elements of an organisation’s data, and through a wider array of departments, than their existing BI counterparts.. Data scientists focus more on data mining and advanced analytics tools than those in traditional BI, and are more likely to make business decisions based on. Across an enterprise, the data scientist role brings data capabilities to a broader spread of departments, though BI professionals maintain an edge in connections with business management, marketing and sales.

▲ Top


DataStage

InfoSphere DataStage, developed by Ascential software, is an ETL tool and part of the IBM Information Platforms Solutions suite. It uses a graphical notation to construct data integration solutions and is available in various versions such as the Server Edition and the Enterprise Edition.

The InfoSphere brand also has a suite of related products such QualityStage, ProfileStage, MetaStage and AuditStage.

In March 2005 IBM acquired Ascential Software and made DataStage part of the WebSphere family as WebSphere DataStage. In 2006 the product was released as part of the IBM Information Server under the Information Management family but was still known as WebSphere DataStage. In 2008 the suite was renamed to InfoSphere Information Server and the product was renamed to InfoSphere DataStage. Confused? We were too.

▲ Top


Data warehouse

A data warehouse (DWH) is a database used for reporting and analysis. The data stored in the warehouse is uploaded from the operational systems using a process known as ETL (Extract, Transform and Load). The data may pass through an operational data store for additional operations before it is used in the DWH for reporting.

The typical data warehouse uses staging, integration, and access layers to house its key functions. The staging layer stores raw data, the integration layer integrates the data and moves it into hierarchical groups and the access layer helps users retrieve data.

Data warehouses can be subdivided into data marts. Data marts store subsets of data from a warehouse.

This definition of the data warehouse focuses on data storage. The main source of the data is cleaned, transformed, catalogued and made available for use by managers and other business professionals for data mining, online analytical processing, market research and decision support. However, the means to retrieve and analyse data, to extract, transform and load data and to manage the data dictionary are also considered essential components of a data warehousing system. Many references to data warehousing use this broader context. Thus, an expanded definition for data warehousing includes business intelligence tools, tools to extract, transform and load data into the repository and tools to manage and retrieve metadata.

▲ Top


DB2

DB2 has a long history and is considered by some to have been the first database product to use SQL (also developed by IBM). However, Oracle released a commercial SQL database product somewhat earlier than IBM did.

DB2 is a family of relational database management system (RDBMS) products from IBM that serve a number of different operating system platforms. According to IBM, DB2 leads in terms of database market share and performance. Although DB2 products are offered for UNIX-based systems and personal computer operating systems, DB2 trails Oracle’s database products in UNIX-based systems and Microsoft’s Access in Windows systems. In addition to its offerings for the mainframe OS/390and VM operating systems and its mid-range AS/400 systems, IBM offers DB2 products for a cross-platform spectrum that includes UNIX-based SCO UnixWare and for its personal computer OS/2 operating system as well as for Microsoft’s Windows and earlier systems. DB2 databases can be accessed from any application programme by using Microsoft’s Open Database Connectivity (ODBC) interface, the Java Database Connectivity (JDBC) interface, or a CORBA interface broker.

▲ Top


ETL (Extract, Transform and Load)

ETL is short forextract, transform and load, three database functions that are combined into one tool to pull data out of one database and place it into another. Extraction is the process of reading data from a database. Transformation is the process of converting the extracted data from its previous form into the form it needs to be in so that it can be placed into another database. Transformation occurs by using rules or lookup tables or by combining the data with other data. Loading is the process of writing the data into the target database.

ETL is used to migrate data from one database to another, to form data marts and data warehouses and also to convert databases from one format or type to another.

▲ Top


Exadata

Oracle Exadata is a database appliance with support for both OLTP and OLAP workloads. It was initially designed in collaboration between Oracle Corporation and Hewlett Packard where Oracle designed the database, operating system (based on the Oracle Linux distribution), and storage software whereas HP designed the hardware for it. With Oracle’s acquisition of Sun Microsystems, Oracle announced the release of Exadata Version two with improved performance and usage of Sun Microsystems Storage Systems technologies.

Exadata was announced by Larry Ellison at the 2008 Oracle OpenWorld conference in San Francisco for immediate delivery. The main headline was that Oracle was entering the hardware business with a pre-built Database Machine, engineered by Oracle. The hardware at this time was manufactured, delivered and supported by HP. Since the acquisition of Sun Microsystems by Oracle circa January 2010, Exadata hardware utilises Sun based hardware. Oracle claims that it is the fastest database server on the planet.

▲ Top


IBM

IBM sells an extensive portfolio of data related products. If you need a database (DB2 or Informix) or an MPP appliance (Netezza), data integration or Master Data Management (Websphere/Datastage), BI/reporting (Cognos) or analytics (SPSS) the chances are that IBM will have a product to suit.

▲ Top


Informatica

Informatica are a US based Information and Data Integration software company specialising in enterprise and corporate level software requirements.

Their flagship product, Powercentre, is one of the leading ETL tools available on the market. They have a comprehensive suite of data management related products including a master data management (MDM) product, data quality and other data integration products and related tools, including a cloud offering.

▲ Top


Kalido

Kalido was born out of a data management project in the oil company Shell and is now headquartered in Boston, USA. It produces a range of data management products including their flagship data warehouse tool, dynamic information warehouse (DIW), a master data management (MDM) tool, a Business Information Modeller (BIM) and a data governance tool. Kalido’s tools can run in traditional Oracle or MS SQL server environments and also MPP environments such as Netezza, Teredata and Exadata. KDR Recruitment has in-depth knowledge of this super-niche market having been working with Kalido customers since inception.

At any one time KDR Recruitment has out and working on our client sites about one third of the skilled Kalido contractors in the UK and are aware of the current project status of the rest.

▲ Top


Kognitio

Kognitio WX2 is a high performance, fully relational database, designed specifically to enable ad-hoc analysis of complex, large volume granular data. Very high levels of performance are achieved without the need for indexing or prepartitioning of the data. This allows users flexible and unconstrained access to the data, as well as significantly reducing system set-up and maintenance costs.

Kognitio WX2 consolidates the full power of groups of unmodified industry-standard, low-cost servers (high performance blade servers) into a single, high performance ‘virtual’ database server. No proprietary hardware is required to create this MPP (Massively Parallel Processing) database environment.

▲ Top


Master data management

Master data management (MDM) comprises a set of processes and tools that consistently defines and manages the master data of an organisation that may include. MDM has the objective of providing processes for collecting, aggregating, matching, consolidating, quality-assuring, persisting and distributing such data throughout an organisation to ensure consistency and control in the ongoing maintenance and application use of this information.

At a basic level, MDM seeks to ensure that an organisation does not use multiple (potentially inconsistent) versions of the same master data in different parts of its operations, which can occur in large organisations. A common example of poor MDM is the scenario of a bank at which a customer has taken out a mortgage and the bank begins to send mortgage solicitations to that customer, ignoring the fact that the person already has a mortgage account relationship with the bank. This happens because the customer information used by the marketing section within the bank lacks integration with the customer information used by the customer services section of the bank. Thus the two groups remain unaware that an existing customer is also considered a sales lead. The process of record linkage is used to associate different records that correspond to the same entity, in this case the same person.

Other problems include (for example) issues with the quality of data, consistent classification and identification of data and data-reconciliation issues.

One of the most common reasons some large corporations experience massive issues with MDM is growth through mergers or acquisitions.

▲ Top


Microstrategy

Microstrategy is an independent company producing Business Intelligence, On-Line Analytical Processing (OLAP) and reporting software.

MicroStrategy is a leading provider of enterprise software platforms for business intelligence (BI), mobile intelligence and social intelligence applications. MicroStrategy’s BI platform enables leading organisations worldwide to analyse the vast amounts of data stored across their enterprises to make better business decisions. Companies choose MicroStrategy BI for its ease-of-use, sophisticated analytics and superior data and user scalability.

The MicroStrategy BI platform delivers actionable information to business users via the web and mobile devices. MicroStrategy’s mobile intelligence platform helps companies and organisations build, deploy, and maintain mobile apps across a range of solutions by embedding intelligence, transactions and multimedia into apps.

MicroStrategy’s social intelligence platform includes a number of applications that help enterprises harness the power of social networks for marketing and e-commerce, as well as a suite of free “friendly” consumer apps that use MicroStrategy’s enterprise technologies. MicroStrategy’s social intelligence platform helps companies leverage the value of social networks to better understand and engage their customers and fans.

The MicroStrategy Cloud offering combines MicroStrategy and third-party software, hardware, and services to enable rapid, cost-effective development of hosted BI, mobile, and social applications.

▲ Top


MPP

MPP refers to a group of hardware products known as Massively Parallel Processing. This is the structured and highly coordinated processing of a single programme by more than one processor, often contained all in one ‘box’. Due to the speed at which many processors, all operating at the same time, can process large amounts of data, this is an important market in the data warehousing world, originated by NCR’s Teradata machines. Newer entrants to this market include Netezza (now an IBM company), Datallegro (now part of Microsoft’s BI offering), Oracle’s Exadata, plus independent offerings from Kognitio and ParAccel.

▲ Top


MS SQL server

Microsoft SQL server is a relational database server developed by Microsoft. It is a software product whose primary function is to store and retrieve data as requested by other software applications, be it those on the same computer or those running on another computer across a network (including the internet). There are at least a dozen different editions of Microsoft SQL server aimed at different audiences and for different workloads (ranging from small applications that store and retrieve data on the same computer, to millions of users and computers that access huge amounts of data from the internet at the same time).

True to its name, Microsoft SQL server’s primary query languages are T-SQL and ANSI SQL.

SQL Server can be developed into a relatively low cost BI solution as the main tools (SSAS/SSAS/SSRS) are bundled in and sold with the database.

▲ Top


Netezza

Netezza is an IBM product providing computer hardware and software and whose principal product is an MPP data warehouse appliance.

IBM Netezza high-performance data warehouse appliances have revolutionised how organisations approach data warehousing and analytics. Engineered from the ground up for analysis of large data volumes, these appliances are purpose-built to make advanced analytics simpler, faster and more accessible. They offer a complete solution that integrates database, advanced analytics, server and storage together to deliver performance, value and simplicity organisations need to handle their rapidly expanding data.

The IBM Netezza data warehouse appliance family is comprised of three lines, the IBM Netezza 1000 series, the IBM Netezza High Capacity Appliance series and the IBM Netezza 100 test and development unit. These appliances have set the standard for analytical processing power — now scaling into the petabytes — while significantly lowering the total cost of ownership. This translates into a powerful, cost effective and easy-to-install way for organisations to scale their business intelligence (BI) and analytical infrastructure.

Each IBM Netezza data warehouse appliance features IBM Netezza Analytics, an embedded software platform that fuses data warehousing and predictive analytics to provide petascale performance. IBM Netezza Analytics provides the technology infrastructure to support enterprise deployment of parallel in-database analytics. The programming interfaces and parallelisation options make it straightforward to move a majority of analytics inside the appliance, regardless of whether they are being performed using tools such as IBM SPSS or SAS or written in languages such as R, C/C++ or Java.

▲ Top


OBIEE

Oracle Business Intelligence Foundation Suite (OBIEE) is a complete, open and architecturally unified business intelligence solution for the enterprise that delivers capabilities for reporting, ad hoc query and analysis, OLAP, dashboards, and scorecards. All enterprise data sources, as well as metrics, calculations, definitions and hierarchies are managed in a Common Enterprise Information Model, providing users with accurate and consistent insight, regardless of where the information is consumed. Users can access and interact with information in multiple ways, including web-based interactive dashboards, collaboration workspaces, search bars, ERP and CRM applications, mobile devices and Microsoft Office applications.

Sometimes OBIEE is used interchangeably with Oracle Business Intelligence Applications (OBIA), which is a pre-built BI and data warehousing solution built on the OBIEE technology stack, however OBIEE is the platform whereas OBIA is an application that uses the platform. The OBI EE Plus integrates the components of the toolset to include a service-oriented architecture, data access services, an analytic and calculation infrastructure, metadata management services, a semantic business model, a security model and user preferences and administration tools.

▲ Top


Oracle

Oracle Corporation is an American multinational computer technology corporation that specialises in developing and marketing computer hardware systems and enterprise software products – particularly database management systems, the flagship Oracle database and the MPP Exadata. Headquartered in California and employing approximately 111,298 people worldwide as of 30 November 2011 (2011 -11-30)[update], it has enlarged its share of the software market through organic growth and through a number of high-profile acquisitions. By 2007 Oracle had the third-largest software revenue, after Microsoft and IBM.

The company also builds tools for database development and systems of middle-tier software, enterprise resource planning software (ERP), customer relationship management software (CRM) and supply chain management (SCM) software.

▲ Top


ParAccel

ParAccel is a venture-backed company focused on developing a next-generation platform purpose-built from the ground up for analytic workloads. During its most recent funding round, existing investors were joined by Amazon.com. ParAccel is based on the vision of founder and CTO, Barry Zane.

In today’s analytics-driven environment, gaining fast and accurate business insights from massive volumes of structured and unstructured data provides significant strategic advantage. As a leader in the high performance analytics market, ParAccel enables organisations to address their most dynamic and complex analytic challenges and rapidly gain ultra-fast, deep insights from very large data sets. ParAccel’s Fortune 1000 customers include companies in the financial services, retail and digital media industries as well as government agencies. Each organisation uses ParAccel to address their business-critical data issues that lie outside the scope of conventional data warehouses and existing analytic tools.

▲ Top


QlikTech

QlikTech is a business intelligence (BI) software company based in Pennsylvania. QlikTech is the provider of QlikView, a business intelligence solution that delivers enterprise-class analytics and search. Its in-memory associative search technology makes calculations in real-time, enabling business professionals to gain insight through intuitive data exploration. QlikView can be deployed on premise, in the cloud, or on a laptop or mobile device for a single user to a large global enterprise.

▲ Top


SAP Business Objects

SAP Business Objects (a.k.a. BO, BOBJ) is a French enterprise software company specialising in business intelligence (BI). Since 2007, it has been a part of SAP AG. The company claimed more than 46,000 customers worldwide in its final earnings release. Its flagship product is BusinessObjects XI, with components that provide performance management, planning, reporting, query and analysis and enterprise information management. Business Objects also offers consulting and education services to help customers deploy its business intelligence projects. Other Business Objects toolsets enable universes, and ready-written reports, to be stored centrally and made selectively available to communities of password-protected usernames, and the SAP Data Services product set.

▲ Top


SAP Data Integrator

Business Objects Data Integrator is a data integration, quality and ETL tool that was previously known as ActaWorks. Newer versions of the software include data quality features and are named SAP BODS (BusinessObjects Data Services) and include a data quality tool. The Data Integrator product consists primarily of a Data Integrator Job Server and the Data Integrator Designer. It is commonly used for building data marts, ODS systems and data warehouses etc.

Additional transformations can be performed by using the DI scripting language to use any of the already-provided data-handling functions to define inline complex transforms or building custom functions.

▲ Top


SSAS

Microsoft SQL Server 2005 Analysis Services (SSAS) delivers online analytical processing (OLAP) and data mining functionality for business intelligence applications. Analysis Services supports OLAP by letting you design, create, and manage multi-dimensional structures that contain data aggregated from other data sources, such as relational databases. For data mining applications, Analysis Services lets you design, create, and visualise data mining models that are constructed from other data sources by using a wide variety of industry-standard data mining algorithms.

▲ Top


SSIS

Microsoft Integration Services (SSIS) is in the ETL family of tools and is a platform for building enterprise-level data integration and data transformations solutions. You use integration services to solve complex business problems by copying or downloading files, sending email messages in response to events, updating data warehouses, cleaning and mining data and managing SQL server objects and data. The packages can work alone or in concert with other packages to address complex business needs. Integration services can extract and transform data from a wide variety of sources such as XML data files, flat files and relational data sources, and then load the data into one or more destinations.

Integration services includes a rich set of built-in tasks and transformations; tools for constructing packages and the integration services service for running and managing packages. You can use the graphical integration services tools to create solutions without writing a single line of code or you can program the extensive integration services object model to create packages programmatically and code custom tasks and other package objects.

▲ Top


SSR

Microsoft SQL Server Reporting Services (SSRS) is a server-based report generation software system. It can be used to prepare and deliver a variety of interactive and printed reports. It is administered via a web interface. Reporting services features a web services interface to support the development of custom reporting applications. SSRS is included in Developer, Standard, and Enterprise editions of SQL Server as an install option. Reporting Services was first released in 2004 as an add-on to SQL Server 2000. The second version was released as a part of SQL Server 2005 in November 2005. The latest version was released as part of SQL Server 2008 in August 2008.

▲ Top


SPSS

SPSS was acquired by IBM in 2009. It is a statistical analysis, or data mining, package used for survey authoring and deployment data mining text analytics, statistical analysis and collaboration and deployment (batch and automated scoring services).

▲ Top


Talend

Talend is an open source software vendor that provides data integration, data management and enterprise application integration software and solutions. Headquartered in Suresnes, France and Los Altos, California, Talend has offices in North America, Europe and Asia and a global network of technical and services partners. Customers include eBay, Virgin Mobile and Allianz. Talend is an Apache Software Foundation sponsor and many of its engineers are major contributors to Apache.

▲ Top


Teradata

Teradata is an enterprise software company that develops and sells a relational database (RDBMS) with the same name. Teradata was a division of the NCR Corporation, which acquired Teradata on February 28, 1991. On January 8, 2007, NCR announced that it would spin-off Teradata as an independently traded company and this spin-off was completed October 1 of the same year.

The Teradata product is referred to as a “data warehouse system” and stores and manages data. The data warehouses use a “shared nothing architecture” which means that each server node has its own memory and processing power, otherwise known as massively parallel processing or MPP. Adding more servers and nodes increases the amount of data that can be stored. The database software sits on top of the servers and spreads the workload among them. Teradata sells applications and software to process different types of data. In 2010, Teradata added text analytics to track, such as word processor documents, and semi-structured data, such as spreadsheets.

▲ Top