A comprehensive model for management and validation of federal big data analytical systems
© The Author(s) 2017
Received: 3 May 2016
Accepted: 21 October 2016
Published: 10 January 2017
In this era of data science, many software vendors are rushing towards providing better solutions for data management, analytics, validation and security. The government, being one of the most important customers, is riding the wave of data and business intelligence. However, federal agencies have certain requirements and bureaucracies for data-related processes, certain rules and specific regulations that would entail special models for building and managing data analytical systems. In this paper, and based on work done at the US government, a model for data management and validation is introduced: Federal Model for Data Management and Validation (FedDMV). FedDMV is 4-step model that has a set of best practices, databases, software tools and analytics. Automated procedures are used to develop the system and maintain it, and association rules are used for improving its quality.
After working with multiple engineers and analysts at the federal agency, there is a general consent that FedDMV is easy to follow (please refer to the experimental survey). However, to quantify that satisfaction, three experimental studies were performed. One is a comparison to other state-of-the-art development models at the government, the second one is a survey that was collected at the government to quantify the level of satisfaction regarding FedDMV and its tool; and finally, a data validation study was performed through detailed testing of the federal system (using an Association Rules algorithm).
To develop a safe and sound federal data analytical system, a tested and rigorous model is required. There is a lack of government-specific models in industry and research. FedDMV aims to provide solutions and guided steps to facilitate the development of data analytics systems given the governmental constraints. FedDMV deals with unstructured data that streams from multiple sources, automates steps that are usually manual, validates the data and maximizes its security. The results of the experimental work are recorded and reported in this manuscript.
KeywordsData management Big data analytics Validation Unstructured data Federal agency
The promise of intelligent and accurate predictions in software systems that was previously pursued by many [1, 2] is now being transformed into a new testament of big data analytics. Undoubtedly, data analytics is a buzz word, if not the “buzziest” word of the day [3–6]; it is a science that lies in the nexus of Statistics and Artificial Intelligence (AI). The goal of data analytics is to return an intelligent and more focused version of large and impersonal datasets, provide quick insights into data, and help with visualization and decision making. Literature has many examples of successful applications of data analytics, not only to specific business-driven functions, but also many industrial and research domains such as Healthcare , Education , Banking & Finance , and many others. The Government is no different. Actually, most federal agencies understand that data can help them unlock the government of the future towards better operations, better citizen service, and more efficient decision making.
Data engineering at the US government
A group of databases
Tools that manage the databases
A set of validation, security and safety procedures
The outcomes of the data system such as dashboards and visualizations.
GAO’s Fourteen Agile Federal Challenges
Table: Federal Challenges
Teams had difficulty collaborating closely.
Procurement practices my not support Agile projects.
Teams had difficulty transitioning to self-directed work
Customer did not trust iterative solutions.
Staff had difficulty committing to more timely and frequent input.
Teams had difficulty managing iterative requirements.
Agencies had trouble committing staff.
Compliance reviews were difficult to execute within an iteration time frame.
Timely adoption of new tools was difficult.
Federal reporting practices do not align with Agile.
Technical environments were difficult to establish and maintain.
Traditional artifact reviews do not align with Agile.
Agile guidance was not clear
Traditional status tracking does not align with Agile.
In a thesis/study published at Princeton University, data structuring was identified as one of the major challenges at federal agencies [17, 18]: “When government does collect and publish data in a reusable way, government enables third-party stakeholders like advocates, academics, journalists and others to powerfully adapt its data in any way they see fit using the latest technologies, and to add value in unexpected ways. Third parties can use government data to experiment in parallel, in order to discover what innovations work best in changing technological environments”. To engineer better data systems however, it is important to first understand the nature of data available in the government; the next section discusses existing and previous federal experiences in this realm.
This paper is structured as follows: the next sub-section looks into what has been already done in the government in terms of software engineering and big data analytics. The section after discusses related work and current federal challenges. Afterwards, the main contribution of this paper, the Federal Data Management and Validation Model (FedDMV) is presented. Subsequently, two experimental studies and survey (on the usability of the tool) are introduced; and the last section concludes the paper with results and future work.
Unstructured data at the US government
To address a legislative issue, for decision making, and for media announcements, most federal agencies require data that streams from other federal departments; or in some cases even city, county or state governments. Some of that data is shared with the public.
Besides, the aforementioned www.data.gov, the PACER system (public courts online access system) , the BLS (Bureau of Labor Statistics), and NASS (National Agricultural Statistics Service)  are major examples of federal data system that adopted the open data initiative and made their data public to citizens. States and cities started publishing data to the public as well. For example, the cities of Chicago  and San Francisco  both have open data portals, data.sfgov.org and data.cityofchicago.org respectively. However, different parties share and store the data in different formats, varying standards, and using different technologies. How is federal data used collectively then? How is it shared across agencies and governments?
Related work: data management models
Lifecycles in software engineering have lead major shifts in the tech progress of the engineering world. The transition from Waterfall development to Spiral , from Traditional to Agile  and from product-based to contextual user-based [29, 30] development left an obvious fingerprint on the world of engineering. Do these models apply to developing a data analytics system? Although considered a fairly novel field, this section introduces the most prominent models for data analytical systems and how they are utilized.
Data analytics development for non-federal systems
Identify and formulate the problem
Prepare the data (pivoting and data cleansing)
Data exploration (summary statistics, bar charts and other means of exploration)
Data transformation and selection (select ranges, and subsets)
Statistical model development,
Validation and deployment
Evaluate and monitor results of model with data
Deliver and refine the model.
When software vendors like Tableau, SAS, SPSS and others realized that the government is highly interested in data analytics as well, they started to propose different models to the government to win its projects. Next section discusses some of these proposed models, and later sections of the paper introduce our new model: The Federal Data Management and Validation Model (FedDMV), and contrast it with an existing relevant model.
Federal data management, validation and analytics
As it is already established, federal departments and agencies are very good with generating and collecting data, but not as good in storing and sharing it. Government IT however, is being constantly challenged to make this data available to employees, the media, the public and other agencies. To address such challenges, multiple software vendors produced federal-specific solutions. Actuate  for example introduced BIRT (a data driven model). It is one of the most used models in government. It has been used by more than million users and features an active community, as Actuate claims . The BIRT process matches most federal regulations and certifications. BIRT provides a list of reports and dashboards that agencies can plug into their data; it has dashboards that deal with federal data sharing, data fraud detection, performance management, and citizen self-service. Actuate deploys BIRT to support federal data operations at the Department of Defense, Federal Aviation Administration, US Equal Employment Opportunity Commission, and others. BIRT however, just like most “commercial solutions” is not a solution that is easy to manage without the support of Actuate, and it implies high dependence on the vendor. It has the advantage of providing dashboards, and quick-ready solutions, but the dependency limits the federal agencies and departments from having their own controlled data infrastructure. Another major player in the federal data area is Salient . Salient calls their solution the Federal Mission Software Solution (FMSS). FMSS is based on agile development, it provides electronic workflows for data such as: electronic federal signatures, electronic disbursement of funds through the US treasury, electronic payment collection from citizens through www.pay.gov, and infrastructure management (among other workflows). Although some Salient solutions are deployed at the US Air Force, the US Army, Department of Commerce, and many others , Salient doesn’t provide a comprehensive workflow to federal agencies that can be followed to build a data system, rather, the models by Salient are tailored to solve specific problems. The corresponding quasi-solutions to those problems are assembled, and accordingly Salient claims that the company provides a comprehensive federal solution for data analytics. Neither BIRT nor FMSS are fully sufficient to a federal agency or department to cover many data management requirements there.
Therefore, there is a gap in literature for an in-house federal driven models that guide the implementation of a data system. Additionally, there is a lack of validation and data sharing techniques at the federal government.
A model that facilitates data streaming between agencies.
A model that validates the data, maximizes its security and provides a standardization mechanism.
A model that is built in-house and without dependency on commercial products
A model that can handle big amounts of data
Based on what have been discussed in previous sections, and due to the immediate need for a management model that has the aforementioned four characteristics, this paper introduces: The Federal Data Management and Validation Model (FedDMV) to address all the challenges and fill the existing void. FedDMV has been used and experimented with at a federal agency. It is introduced in the next section.
The model for federal analytical systems
This section introduces the main components of FedDMV; the main contribution of this paper. The major overall challenge with big data is non-repetitive unstructured data. Structured data is much easier to manage, and if the data is repetitive, then it is also easier to predict its contents and therefore easier to control [36–38]. However, what is the best way to stream and structure data? When data resides at a federal agency, how is it validated? FedDMV provides answers to these problems.
The federal data management, streaming and validation
Besides guiding the development team through the steps of developing an analytical data system, FedDMV aims to solve three main issues:
1. Data volume challenges: Quantity of data that federal agencies import and generate is big (thus making it a big data challenge). This data is highly diverse and the speed at which data flows/updates from multiple sources is high (much of it is daily, some weekly, and some monthly/yearly).
2. Distribution challenges: The federal agency aims to build visuals and shareable solutions.
3. Data quality: Validating federal data has many challenges, and requires many resources.
- 1.Data Management:
Data Sources Variables Exploration: all the unstructured data from all sources is unified into the new FedDMV centralized database.
Variable Mapping and Columns Unification: redundant (repetitive) variables are concatenated in columns, and potential contents of the columns are identified (including their data types, size, and formats).
Data Standards Creation: this step is the main step for creating Lookup (LU) tables in the database. LUs are dictionaries for standardizing the different forms of data from multiple sources.
- 2.Data Validation: Validate and verify the data in the target Database (DB) – using an Association Rules Algorithm (discussed in the next section). Validation is done is 3 steps:
Test case collection and execution.
Test case association measurement and evaluation.
Data system refinement.
Data Security Assurance
Federal Restrictions Deployment: deploy federal regulations into the data system.
EXP TIRE, AUTO PARTS, GER, US $70.00
Similar data is presented in a different form from another data source:
Auto Tires, Germany, Trade Exports from: USA, 70 US Dollars.
Another data source would have similar data like this: GRTIRExUS70; that needs to be broken down into multiple fields and eventually migrated into data fields in the destination table. The design for the database needs to consider both (and possibly more) formats of data streams. However, similar data fall within the same data group (the data is grouped into different types/groups and each group consist of data that share common characteristics). The goal is to standardize the data format, and to have a single version of the truth data format in the FedDMV system. The dimensions that need to be standardized cover the where, when and what. That is correspondent with Geography, Time, and specific subject matter data respectively. Geography covers Countries, States, Counties, Cities, and so on. While Time addresses Years, Months, Weeks, Hours, and Minutes. Analysts then interact with the data in these tools, build dashboards, and visualizations for publication on federal websites, executive’s mobile devices and to be shared with other users or the general American public.
Federal data validation
“If the location of bugs can be made more precise, both the calendar time and resource requirements of testing can be reduced. Modern data and software products typically contain millions of lines of code. Precisely locating the source of bugs in that code can be very resource consuming.”
Literature therefore, has a gap in error allocation; no or few AI-driven method was introduced to help analysts and engineers pin-point to errors that would affect the overall health of a data system. Some methods touched on applying AI logic to software engineering, and some others developed means to locate errors that are not based on a solid AI approach. FedDMV uses association rules for that purpose, ART is presented next.
Association rules validation and testing
Association Rules (AR) is one of the most commonplace data analytical models . AR are intended to identify patterns of the type: “Action B often comes after action A, and is followed by Action C”. One of the more well-studied problems in data mining is the search for AR in market basket data, or mostly referred to Market Basket Analysis (MBA). MBA allows businesses to realize which commodities are bought together, and therefore, they put them close to each other on store shelves. The outcome of the AR model is a set of rules. These rules include consequents (B → C), antecedents (A → B), support, and confidence of the rules. Confidence measures the reliability of the inference made by a rule. For a given rule B → C, the higher the confidence, the more likely it is for C to be present in transactions that contain B. Confidence also provides an estimate of the conditional probability of C given B.
Support is an important measure as well, because a rule that has very low support may occur simply by chance. Association Rules outputs should be interpreted with thoughtfulness. The inference made by an association rule does not necessarily imply causality. Instead, it suggests a strong co-occurrence relationship between items in the antecedent and consequent of the rule.
Data Collection: In order to increase the accuracy of any data mining model, a fair amount of high quality data required. More data helps the analysts and the engineers get a better quality model that could be more insightful.
Model Development and Data Training: In this stage, the AR model is built and the data is trained.
AR Model Outputs: Outputs of the model are antecedents, consequents, confidence and support.
Sort all Predictions: Consequent predictions are sorted by confidence, top predictions with highest confidence results are then considered for testing with data streaming.
Federal intelligent testing: using the outcomes from step 3 and the predictions from step 4, testing is performed and focused on the system modules present in the AR model’s consequents.
Federal data security
Database security is about preventing malicious users from accessing or modifying data stored in databases. Similar to the generic security concept, three key aspects determine how secured a database is: Confidentiality, Integrity, and Availability. Confidentiality refers to only disclosing data to authorized users. To verify a user’s identity and control access to data, database management systems (DBMS) use different methods of authentication [61–63]. Integrity refers to protecting the database from unauthorized writing. Data stored in databases should not be modified improperly, (i.e. corrupted). At the age of “big data”, protecting database is one of the critical missions. Access control was one of the earliest database security measures proposed and widely used. In most cases, access control models can be categorized into three classes: discretionary access control (DAC), mandatory access control (MAC), and role-based access control (RBAC) [62–65] – this is the method adopted by FedDMV. The reason for that is based on the notion that federal data control is dependent on employees’ roles, access levels, and aggregations. The next section introduces the experimental work that was performed with FedDMV.
FedDMV experimental studies
This section introduces the main three experimental works of this paper, the first is through comparing a big data analytics system built using FedDMV to another built through traditional relational database; the second by surveying feedback on FedDMV’s tool from federal employees and analysts. The third is a test of ART, FedDMV’s validation method. The subsections present pros and cons of the two compared processes, outcomes of the comparison, results of the survey, ART and the lessons learnt.
Setup for the experiments
FedDMV Steps vs. TRDb steps
Standards are created based on the data streams and the data sources. All standards are saved in a knowledge base.
Lookups created based on informal conversations with federal employees
Automated data management using a software tool.
Manual migrations (Excel and SSIS)
Using an automated Association Rules Driven approach to validate the data extensively.
Manual data validation and verifications practices
FedDMV Tool Development and Testing
A tool that gives federal employees access to the system through a GUI.
No such tool available
Data Security and Federal Restrictions Deployment
Well documented security and privacy routines.
Routines created as required
Data was collected to run the experiment a total of 633 rows was assembled. The system under test has 11 modules, with an average of 58 functions. Due to proprietary reasons, the nature and details of the system can’t be exposed.
Experimental results and conclusions
This section presents the experimental results of the three studies, pros and cons of FedDMV and TRDb, conclusions and future work.
FedDMV Time Consumption
Data Validation (ART)
FedDMV Tool Development and Testing
Data Security and Federal Restrictions Deployment
TRDb Time Consumption
Requirements Gathering and Variables’ Understanding
Building Database Tables
Data Migration and Management
Business Rules Development
Giving Federal Employees Access to Data System
FedDMV vs. TRDb (Pros and Cons)
Easy data management of centralized DB.
No “single version of the truth”.
Better data validation practices that are driven by intelligent methods (association rules).
No more manual manipulations with data and tables, the analysts use the FedDMV tool (leads to higher security).
updates and maintenance difficulties.
Resilient data integration and standardization routines.
Data not comprehensively structured in tables. Difficult publishing and sharing.
Less manual work by engineers and federal employees. Automated data streaming routines available.
Data overlap/redundancy between different databases – results in many inconsistencies. No automated data streaming routines.
Easier sharing and publishing. Role-based data security.
Uses hand entered data that increases the risk of errors. Data security constantly compromised.
Access to advanced analytical capabilities.
TRDb lacks tools that allow access to analytics.
For example, 9% of the analysts gave us negative feedback that us relevant to the notion that they don’t see the need to move away from more manual processes. However, 71% gave great or good feedback, and 20% thought that this change is necessary and eventually inevitable.
Discussion and Conclusion
Resource consumption rates: is the process expensive? How many hours? Manpower?
Complexity: how difficult is the model to follow?
Practicality: is the model merely theoretical or is it a good fit for real world projects?
Validity: ensuring that the data and the processes are valid
In the experimental study of this paper, FedDMV is compared with TRDb, a traditional process for building data systems. To assess the four points: in terms of resource consumption, FedDMV presented a significant improvement over traditional federal processes (45 months vs. 81 months), and was successful in delivering a big data system at a federal agency in a timely manner. In terms of complexity and practicality, FedDMV has 4 clear steps, and after working with multiple engineers at the federal agency, there is a general consent that FedDMV is easy to follow (the tool survey shows that as well). As for validity, FedDMV focuses of validation through ART.
Perform more experiments for ART and general quality assurance of data systems.
Deploy big data systems with other federal agencies. That will put FedDMV to test with different types of data, and different types of processes.
Deploy FedDMV with smaller types of systems and evaluate its feasibility with that.
Compare FedDMV to more data analytics models, in terms of time consumption, usability and applicability - such as PACER and BIRT.
Provide more software tools for managing the FedDMV process, and aid in project planning, tracking resources, and monitoring the 4 steps of the process.
FedDMV focuses on major federal big data aspects such as validation, streaming, security, and automation. In this age of big data, FedDMV is introduced as an effective and efficient 4-step model to follow, and is a very strong candidate for federal agencies that aim to develop new data systems.
Affordable Care Act
Association of Government Accountants
Database Management Systems
Federal Data Management and Validation
Federal Mission Software Solution
Government Accountability Office
Market and Trade Economics Division
National Agricultural Statistics Service
Public Courts Online Access System
Traditional Relational Database
United States Department of Agriculture
The authors would like to convey their thanks to the staff of the United Stated Department of Agriculture - Economic Research Services (ERS); especially the Applications Development Branch (ADB) and the Market and Trade Economics Division (MTED). The views and opinions expressed in this research paper are those of the authors and do not reflect the official policy or position of any agency or department of the U.S. government.
Additionally, an acknowledgement goes to the following George Mason University students for their help and hard work with the federal projects: Gowtham Ramamoorthy, Manish Dashora and Samantha Dcosta.
Availability of data and materials
The actual Data is not shared because this is federal data and parts of the data set are private or confidential, therefore, we are not allowed to share it. However, to get a sample data set visit: http://ers.usda.gov/data-products.aspx. All the software tools used in the experiments and as part of FedDMV are available via the online sharing website (Dropbox), upon request from the author.
FB worked on the overall design and development of the FedDMV model and the associated system, executed the experiments, collected data for the comparison study, and interacted with the federal employees on frequent basis. FB wrote multiple sections of this manuscript and provided the illustrations. RY worked with the federal agency management to setup the experiments. RY wrote multiple sections of the paper and provided multiple insights. LD developed the tool, and executed some of the experiments. LD also wrote some parts of the paper. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Turing AM. Computing machinery and intelligence. Mind. 1950;49:433–60.MathSciNetView ArticleGoogle Scholar
- McCarthy J. Programs with common sense. In: Proceedings of the Symposium on Mechanisation of Thought Processes, vol. 1. London: Her Majesty’s Stationery Office; 1958. p. 77–84.Google Scholar
- Koh HC, Gerald T. Data Mining Applications in Healthcare. J Healthc Inf Manage. 19. 2.
- Jing L. Data Mining Applications in Higher Education, SPSS Executive Report. 2004.Google Scholar
- Book: Industry Applications of Data Mining. Chapter 8, 1999.
- Harmann M. The Role of Artificial Intelligence in Software Engineering. Published report at the Crest Center, University College London.
- An Immediate Release by the Office of Science and Technology Policy. Executive Office of the President. The Big Data Initiative. 2012.
- The White House Big Government Initiative: http://www.whitehouse.gov/Open/.
- White Paper by Accenture. Accenture Federal Services, Federal Analytics and Big Data. 2012.Google Scholar
- Letter from James Madison to W.T. Barry (Aug. 4, 1822), reprinted in The Writings of James Madison (Gaillard Hunted.)
- A White Paper by CC Pace Systems. Agile in the Federal Government. 2014.Google Scholar
- Batarseh FA. Incremental Lifecycle Validation of Knowledge-Based Systems through CommonKADS. PhD Dissertation Registered at the University of Central Florida and the Library of Congress; 2012.
- DePillis L. The Way Government does Tech is Outdated and Risky. A Report Published at the Washington Post; 2013.
- Smith M, Cohen T, A CNN Report on the ACA Website. Problems with Health Website. 2013.Google Scholar
- United States Government Accountability Office (GAO). Software Development - Effective Practices ad Federal Challenges in Applying Agile Methods, GAO-12-681, gao.gov. 2012.Google Scholar
- Batarseh F, Gonzalez A. Predicting Failures in Contextual Software Development through Data Analytics. Proc Springer Softw Qual J. 2015.
- Ming-Tun Yu H. Designing Software To Shape Open Government Policy. A Dissertation Presented to the Faculty of Princeton University in Candidacy for the Degree of Doctor of Philosophy. The Department of Computer Science, Princeton University; 2012.
- Hahn R. Government Policy toward Open Source Software: An Overview. Chapter 1. Published at the Brookings Institute.
- National Agricultural Statistics Service: http://quickstats.nass.usda.gov/.
- City of Chicago Data Portal: http://data.cityofchicago.org.
- San Francisco Governmental Data Website: http://data.sfgov.org.
- Swish-Data - Data Performance Architects, Hadoop Uses Cases: Big Data for the Government, http://www.swishdata.com/index.php/blog/article/hadoop-use-cases-big-data-for-the-government.
- MapR: Big Data and Apache Hadoop for Government: http://www.mapr.com/solutions/industry/big-data-and-apache-hadoop-government.
- Horton Works by Apache: http://hortonworks.com/industry/government/.
- Microsoft’s SQL Server: http://www.microsoft.com/SQLserver.
- SAS: http://www.sas.com/en_us/home.html.
- Tableau: http://www.tableau.com/.
- Batarseh F, Gonzalez AJ. Incremental Lifecycle Validation of Knowledge-Based Systems through CommonKADS. Published at the IEEE Transactions on Systems, Man and Cybernetics (TSMC-A); 2012.
- Batarseh F. Context-Driven Testing. In: Context in Computing: A Cross-Disciplinary Approach for Modeling the Real World. Patrick Brezillon, Editor (Sorbonne University- Paris VI). Springer Verlag; 2015. ISBN: 978-1-4939-1886-7.
- Scacchi W. Process Models in Software Engineering. Institute for Software Research, Encyclopedia of Software Engineering. 2nd Ed. John Wiley and Sons; 2001.
- Gartner’s Magic Quadrant: http://www.gartner.com.
- A Report by Actuate. Reporting and Data Analytics for Federal Applications. Actuate BIRT; 2014
- Report S. Federal Mission Software Solution. 2014.Google Scholar
- Information-builders. Predictive Analytics for Federal Government. New York: Product Brochure; 2011.Google Scholar
- Corporate Partner Advisory Group, Research Series. Leveraging Data Analytics in Federal Organizations, Report no. 30. 2012.Google Scholar
- Howard P, Potter C. Data Migration in the Global 2000 - Research, Forecasts and Survey Results. 2007. p. 29.Google Scholar
- Morris J. Practical Data Migration. 3rd ed. Swindon: British Informatics Society Ltd; 2006.Google Scholar
- Inmon W, Linstedt D. Data Architecture: A Primer for the Data Scientist: Big Data, Data Warehouse and Data Vault. Morgan Kaufmann; 2014.
- A Report by IBM. The Hidden Costs of Data Migration - Strategies for Reducing Costs and Quickly Achieving Return on Investment. 2007.Google Scholar
- Wei B, Chen T. Verifying Data Migration Correctness: The Checksum Principle. RTI Press; 2014. OP-0019-1403.
- Department of Education, Office of Federal Student Aid. Data Migration Roadmap: A Best Practice Summary, Version 1.0. 2007.Google Scholar
- Wu B, Lawless D, Bisbal J, Richardson R, Grimson J, Wade V, Sullivan D. The Butterfly Methodology: A Gateway-free Approach for Migrating Legacy Information Systems, Proceedings of the 3rd IEEE International Conference on Engineering of Complex Computer Systems (ICECCS97). Como: Institute of Electrical and Electronics Engineers; 1997. p. 200–5.Google Scholar
- Haller K, Matthes F, Schulz C. Testing & Quality Assurance in Data Migration Projects, Proceedings of the27th IEEE International Conference on Software Maintenance ICSM. 2011.Google Scholar
- Thalheim B, Wang Q. Data Migration: A Theoretical Perspective. Data Knowl Eng. 2013;87:260–78.View ArticleGoogle Scholar
- Spivak D. Functional data migration. Inf Computing. 2012;217:31–51.MathSciNetView ArticleMATHGoogle Scholar
- Oracle’s Document: Move to Oracle Database with Oracle SQL Developer Migrations.
- Oracle White Paper. Migrating Oracle Databases. 2014.Google Scholar
- Haller K, Matthes F, Schulz C. Testing & Quality Assurance in Data Migration Projects. In: 27th IEEE International Conference on Software Maintenance ICSM. 2011.Google Scholar
- Fine L, Keogh B, Cretin S, Orlando M, Gould M. How to Evaluate and Improve the Quality and Credibility of an Outcomes Database: Validation and Feedback Study on the UK Cardiac Surgery Experience. BMJ. 2003;326(7379):25–8.View ArticleGoogle Scholar
- Flamos A, Doukas H, Psarras J. Data Validation Platform for the Sophisticated Monitoring and Communication of the Energy Technology Sector. Renew Energy. 2010;35(5):931–5.View ArticleGoogle Scholar
- Scheier R. Data Migration Strategies and Best Practices. An Article Published at TechTarget.
- Burry C, Mancusi D. How to Plan for Data Migration, Proceedings of the Advanced Information Systems Engineering: 21st International Conference. 2004.Google Scholar
- Klazema,A. Data Migration Strategies for Businesses. http://blog.udemy.com/data-migration-strategies/. An Udemy Blog.
- Levine R,A web-report by Alta Flux Corporation. Data Migration Strategies. 2013.
- Anavi-Chaput V, Arrell K, Baisden J, Corrihons R, Fallon D, Siegmund L, Sokolof N. Planning for a Migration of PeopleSoft 7.5 from Oracle/UNIX to DB2 for OS/390. Poughkeepsie; 2000. p. 148.
- Woodall P, Borek A, Parlikad A. Data quality assessment: The Hybrid Approach. Info Manag. 2013;50(7):369–82.View ArticleGoogle Scholar
- A Report by IBM. Best practices for data migration - Methodologies for assessing, planning, moving and validating data migration. Somers; 2009. p. 16.
- National Institute of Standards and Technology: WWW.NIST.GOV.
- Planning Report for NIST (US Department of Commerce). The Economic Impacts of Inadequate Infrastructure for Software Testing. 2002.Google Scholar
- Association Analysis: Basic Concepts and Algorithms, Introduction to Data Mining, Chapter 6, pp. 327–414, by: Pang-Ning tan, Michael Steinbach, and Vipinkumar.
- Oracle’s Database Concepts Guide: From the Oracle Help Center.
- De Capitani di Vimercati S, Foresti S, Samarati P. Recent Advances in Access Control. In: Handbook of Database Security. 2008; pp. 1–26.
- Bertino E, Jajodia S, Samarati P. Database security: Research and practice. Information Systems. 1995.Google Scholar
- Hore B, Mehrotra S, Hacigümüç H. Managing and Querying Encrypted Data. In: Handbook of Database Security. 2008. p. 1–26.Google Scholar
- Oracle’s Database Advanced Security Administrator’s Guide: From the Oracle Help Center.