{"id":6575,"date":"2020-02-17T15:37:53","date_gmt":"2020-02-17T14:37:53","guid":{"rendered":"https:\/\/keplervision.eu\/?page_id=6575"},"modified":"2025-03-11T17:21:22","modified_gmt":"2025-03-11T16:21:22","slug":"deep-learning","status":"publish","type":"page","link":"https:\/\/keplervision.eu\/en\/deep-learning\/","title":{"rendered":"White Paper Deep Learning"},"content":{"rendered":"<!--themify_builder_content-->\n<div id=\"themify_builder_content-6575\" data-postid=\"6575\" class=\"themify_builder_content themify_builder_content-6575 themify_builder tf_clear\">\n                    <div  data-lazy=\"1\" class=\"module_row themify_builder_row tb_ioth476 tb_first tf_w\">\n                        <div class=\"row_inner col_align_top tb_col_count_1 tf_box tf_rel\">\n                        <div  data-lazy=\"1\" class=\"module_column tb-column col-full tb_84m6478 first\">\n                    <!-- module text -->\n<div  class=\"module module-text tb_vh3f478   \" data-lazy=\"1\">\n        <div  class=\"tb_text_wrap\">\n        <h1>White Paper On Best Practices for Software Development in Deep Learning<\/h1>\n<p>We congratulate our Kepler team member Per John for having his article on \u201cBest Practices for Software Development in a Deep Learning Environment\u201d published on Towards Data Science. Towards Data Science provides a platform for thousands of people to exchange ideas and to expand the understanding of data science.<\/p>\n<p>For convenience, a copy of the publication is provided below:<\/p>\n<h3>Software development: best practices in a deep learning environment<\/h3>\n<p>Deep learning systems are now being used extensively in many environments. They differ from traditional software systems in the manner through which output is generated: decisions which are made to produce results are learned from training data, rather than being hand-coded, as is the case with traditional software systems.<\/p>\n<p>This has led people to wrongly believe that software development is not as necessary for machine learning systems, since the algorithms, which are at the heart of any substantial system, are learned by the system itself. Furthermore, state-of-the-art machine learning (ML) is advancing so rapidly that data scientists are focusing primarily on short term goals like model precision. Given that the deep learning world will have changed within a matter of months, why waste time properly engineering your system today? Currently, the lie of the land in deep learning engineering is that a system is experimented together until some precision goal is achieved, and then is put into production.<\/p>\n<p>Even if different from traditional systems, this method of engineering a ML system is pernicious for several reasons:<\/p>\n<ul>\n<li style=\"list-style-type: none;\">\n<ul>\n<li>A mistake can easily be made, thereby producing an incorrect precision. For instance, because of an error in data loading or transformation, or the incorrect handling of training\/test\/validation sets.<\/li>\n<li>Reproducing training results is time-costly since the data and configuration used is often undocumented and not versioned.<\/li>\n<li>Running the ML model in a production environment is difficult since the system is only tested in a poorly documented developer environment.<\/li>\n<li>Adapting the ML system to changing requirements is also tricky since maintainability of the code base was not a concern, and the system lacks automated tests.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>Given these challenges, how do you engineer deep learning systems whilst avoiding the detrimental effects of throw-away coding and experimentation? In this article, we share the best practices developed and adopted at <a href=\"https:\/\/keplervision.eu\/\"><span style=\"text-decoration: underline;\">Kepler Vision Technologies<\/span><\/a>. These practices are rooted firmly in modern-day software development practices and account for the peculiarities of developing machine learning systems. So, what is special to developing deep learning systems, compared with traditional systems?<\/p>\n<figure id=\"attachment_3335\" class=\"wp-caption alignleft\" style=\"width: 400px;\" aria-describedby=\"caption-attachment-3335\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-3335\" src=\"https:\/\/keplervision.eu\/wp-content\/uploads\/2019\/04\/0_wuN6nJiONvPVk4E6-300x201.jpeg\" alt=\"\" width=\"400\" height=\"268\" srcset=\"https:\/\/keplervision.eu\/wp-content\/uploads\/2019\/04\/0_wuN6nJiONvPVk4E6-300x201.jpeg 300w, https:\/\/keplervision.eu\/wp-content\/uploads\/2019\/04\/0_wuN6nJiONvPVk4E6-768x514.jpeg 768w, https:\/\/keplervision.eu\/wp-content\/uploads\/2019\/04\/0_wuN6nJiONvPVk4E6-1024x685.jpeg 1024w, https:\/\/keplervision.eu\/wp-content\/uploads\/2019\/04\/0_wuN6nJiONvPVk4E6.jpeg 1200w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/>\n<figcaption id=\"caption-attachment-3335\" class=\"wp-caption-text\">Photo by Shane Rounce on Unsplash<\/figcaption>\n<\/figure>\n<h3>The peculiarities of deep learning engineering<\/h3>\n<p>In this section we list the features of deep learning engineering that present major obstacles to the adoption of best practices from traditional software engineering. In the section following this section, we present the measures we took to make our deep learning development process more robust and efficient, overcoming (part of) the obstacles listed here.<\/p>\n<p><strong>1. Rapidly evolving deep learning frameworks<\/strong><br>At the dawn of the deep learning revolution, models were built based on low-level libraries. Sometime later, specialized libraries began appearing to aid ML engineers, most notably with the launch of the Caffe library, founded in 2013. Three years later, a plethora of other deep learning libraries had come into existence. Scrolling forward to today, the number of different libraries continues to increase, though a group of leading front-runners has formed. Besides the birth of new individual libraries, each library itself must constantly be subjected to rapid evolution as they try to keep up with new developments in neural network research.<\/p>\n<p><strong>2. Data management<\/strong><br>A major characteristic of deep learning systems is that the amount of input data is much larger, as is the size of an individual data item, compared with traditional software systems. This is particularly true during the training phase, in which the amount of data used is often highly substantial. Whilst it is also true that data stores that are managed by traditional software systems can become colossal in size, these systems can still be run and tested by using just a fraction of the given data. Training a deep learning system, on the other hand, can never be successful unless vast amounts of data are utilized.<\/p>\n<p><strong>3. Configuration management<\/strong><br>Managing configuration data is an intricate art in software engineering, and best practice is that configuration data is versioned in such a way that it is clear which configuration data belongs to which version of the source code. This best practice also holds for machine learning systems, but remains hard to implement, as the size of the configuration data is so large in comparison with traditional systems. After all, the weights of a trained neural network are configuration data and can easily measure up to a total of 1Gb.<\/p>\n<p>The so-called hyper-parameters used when training a deep learning model are also configuration data and must be managed in kind. As it turns out, these hyper-parameters are often scattered around the code, tuned in such a way that the precision goal can be met.<\/p>\n<p><strong>4. Testing deep learning systems<\/strong><br>Test automation is now widely used in software development. In the deep learning world, however, test automation is not used to nearly the same extent. The reasons for this are twofold: first, deep learning developers are not properly trained in modern software engineering practices, and second that developing a deep learning model requires a great deal of experimentation.<\/p>\n<p>Additionally, training a deep learning system is a non-deterministic process. This makes automated testing of the training process more difficult, since simple asserts on training outcomes will not be enough. Take into account the long training times, and it is easy to see why most deep learning engineers forgo automated testing.<\/p>\n<p>Automated testing of inferencing is more straightforward. Once a model is trained only small amounts of non-determinism remain due to rounding errors.<\/p>\n<p><strong>5. Deployment of deep learning systems<\/strong><br>Two central features distinguish deployment of deep learning systems from traditional systems: Firstly, most deep learning systems need special hardware, a GPU, to run with enough throughput. Secondly, deep learning systems require a large configuration file, the weight file, to be able to make predictions. This makes achieving dev\/prod parity, that is, a similar environment for development and testing, more difficult. Moreover, it is unwise to package the weights file together with the source code in one deployment package because of its size. A result of this is that you need to deploy weight files and the system using separate artifacts, whilst keeping them in sync.<\/p>\n<h3>Our approach to engineering deep learning systems<\/h3>\n<p>Our approach is based on the guiding principle that training and running a deep learning system should be automated to the greatest possible extent. It should not be based on an individual person needing to train and evaluate the model after weeks of experimentation.<\/p>\n<p><strong>Using continuous integration to build Docker images<\/strong><br>We use Docker to specify the development and run-time environment of our deep learning systems. When we push changes to our version control system\u200a\u2014\u200ain our case GitHub\u200a\u2014\u200aTravis kicks in and runs tests, and if successful, will then build the latest Docker images and push them to the Docker repository. The Docker images will not contain the model, as that would make our Docker images too big to easily distribute.<\/p>\n<p>Using Docker, the run-time environment is secured together with the code and it is possible to deploy and run the system reliably in any environment supporting Docker, regardless of the deep learning framework used.<\/p>\n<figure id=\"attachment_3336\" class=\"wp-caption alignleft\" style=\"width: 400px;\" aria-describedby=\"caption-attachment-3336\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-3336\" src=\"https:\/\/keplervision.eu\/wp-content\/uploads\/2019\/04\/0_cj2-qxaGC368JOxv-300x200.jpeg\" alt=\"\" width=\"400\" height=\"266\" srcset=\"https:\/\/keplervision.eu\/wp-content\/uploads\/2019\/04\/0_cj2-qxaGC368JOxv-300x200.jpeg 300w, https:\/\/keplervision.eu\/wp-content\/uploads\/2019\/04\/0_cj2-qxaGC368JOxv-768x511.jpeg 768w, https:\/\/keplervision.eu\/wp-content\/uploads\/2019\/04\/0_cj2-qxaGC368JOxv-1024x682.jpeg 1024w, https:\/\/keplervision.eu\/wp-content\/uploads\/2019\/04\/0_cj2-qxaGC368JOxv.jpeg 1200w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/>\n<figcaption id=\"caption-attachment-3336\" class=\"wp-caption-text\">Photo by elCarito on Unsplash<\/figcaption>\n<\/figure>\n<p><strong>Putting cloud first<\/strong><br>We strive for our models to be trained and evaluated on an instance in the cloud (we use Amazon AWS), since this disentangles the platform details from running our system. The Docker image approach described above, together with Amazon AWS, combine neatly together to achieve this abstraction.<\/p>\n<p><strong>Storing data in a central location<\/strong><br>Data management is an integral part of deep learning systems. We store data and annotations in a central repository (AWS S3 buckets), and use scripts to sync data locally, such that our applications, jobs and developers have access to the latest data. Jobs always run using clean copies of data, and we deem the time taken in downloading the data well spent; this gives us certainty that the right train and test data is being used.<\/p>\n<p><strong>Storing weight files in a central location<\/strong><br>We store weight files in a central location and pull them in if needed to start an inferencing server or an evaluation job. The weight files are updated when a training job has produced an improved version of the model. Storing weight files centrally ensures that jobs and developers have easy access to the latest model.<\/p>\n<p><strong>Storing hyper parameter configuration in a central location<\/strong><br>We store (changes to) hyper parameters used for training and evaluation in a configuration file. This prevents these parameters from becoming scattered around the source code. We use a central location to store these configuration files so that they are accessible to jobs and developers.<\/p>\n<figure id=\"attachment_3337\" class=\"wp-caption alignleft\" style=\"width: 400px;\" aria-describedby=\"caption-attachment-3337\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-3337\" src=\"https:\/\/keplervision.eu\/wp-content\/uploads\/2019\/04\/1_zKsgmiQ66tpicgu-ppA5Tg-300x192.png\" alt=\"\" width=\"400\" height=\"256\" srcset=\"https:\/\/keplervision.eu\/wp-content\/uploads\/2019\/04\/1_zKsgmiQ66tpicgu-ppA5Tg-300x192.png 300w, https:\/\/keplervision.eu\/wp-content\/uploads\/2019\/04\/1_zKsgmiQ66tpicgu-ppA5Tg-768x492.png 768w, https:\/\/keplervision.eu\/wp-content\/uploads\/2019\/04\/1_zKsgmiQ66tpicgu-ppA5Tg-1024x656.png 1024w, https:\/\/keplervision.eu\/wp-content\/uploads\/2019\/04\/1_zKsgmiQ66tpicgu-ppA5Tg.png 1194w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/>\n<figcaption id=\"caption-attachment-3337\" class=\"wp-caption-text\">Jenkins training job<\/figcaption>\n<\/figure>\n<p><strong>Consolidating training and evaluation into jobs<\/strong><br>We consolidate training and evaluation into jobs using the <a href=\"https:\/\/jenkins.io\/\" target=\"_blank\" rel=\"noopener\">Jenkins automation server.<\/a> The jobs pull in the latest Docker images, as well as the data and annotations, and will run training or evaluation inside a Docker container. The Docker container has volumes attached which point to the data, the annotations and the model file in the case of evaluation.<\/p>\n<p>After a training job has been completed, a weights file will be produced which can be stored providing it meets the KPIs of the model.<\/p>\n<p>The job will consolidate all the knowledge and data needed into scripts, and anyone will be able to reliably train or evaluate a model, without needing to understand the intricacies of the model.<\/p>\n<p>We do however archive the annotations files, configuration files and produced weight files so that the results of a job can be reproduced.<\/p>\n<p><strong>Using remote development features of PyCharm<\/strong><br>It is more productive when developers can use an integrated development environment (IDE). Sometimes it can be necessary for developers to test and debug the source code and model on a GPU, which is often unavailable directly on their development machine. One way to combine using an IDE while running the code on a GPU is to start a Docker container on a machine with a GPU and then using PyCharm\u2019s features for running a remote interpreter. A remote interpreter significantly reduces round-trip time compared with uploading source code to a remote server and add logging statements for debugging.<\/p>\n<p><strong>Using test automation<\/strong><br>As explained above, writing effective and efficient automated tests is harder for deep learning systems than for traditional software systems. We do write automated tests for deep learning systems, using the pytest framework, but not as extensively as we would do for traditional software systems.<\/p>\n<p>We focus on testing data input and on important algorithms. We also write tests for training and evaluation, but only to the extent that the code will run smoothly. We do not check that a full training run produces the correct results, as this would be too time-consuming. Instead, we write an automated test that runs one or two epochs on a very small data set, then we check that output metrics are produced but do not assert their values.<\/p>\n<p>The result of our test automation effort is that our code base is easier to understand and that changes require less effort to implement.<\/p>\n<p><strong>Measuring maintainability metrics<\/strong><br>A software system needs to be maintained in order to adapt to changing requirements and environments, and a deep learning system is no exception. The more maintainable a system is, the easier it will be to modify and extend the system. We use bettercodehub.com to check 10 software engineering guidelines and to notify us when a system drifts off course. We prefer this tool over code violation tools since it allows us to focus on the important maintainability guidelines, rather than having to deal with vast amounts of violations which a tool like PyLint will report.<\/p>\n<p><strong>Conclusion<\/strong><br>Deep learning is a relatively new addition to the IT toolbox of organizations. It is partly rooted in traditional software development and partly in data science. Taking into account that deep learning remains in its infancy, it is unsurprising that standard engineering practices have not yet been established. The development of deep learning software is an integral part of software engineering however, and we must draw inspiration from the principles of modern software engineering as a starting point for improving deep learning software engineering.<\/p>\n<p>The practices listed above have made our development process for deep learning models more robust and repeatable and have created a common ground for collaboration between deep learning practitioners and software engineers. The process is not perfect, as no process is, and we are continuously searching for ways to improve on our current working methods.<\/p>\n<p>Thanks to Rob van der Leek.<\/p>    <\/div>\n<\/div>\n<!-- \/module text -->        <\/div>\n                        <\/div>\n        <\/div>\n        <\/div>\n<!--\/themify_builder_content-->","protected":false},"excerpt":{"rendered":"<p>White Paper On Best Practices for Software Development in Deep Learning We congratulate our Kepler team member Per John for having his article on \u201cBest Practices for Software Development in a Deep Learning Environment\u201d published on Towards Data Science. Towards Data Science provides a platform for thousands of people to exchange ideas and to expand [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-6575","page","type-page","status-publish","hentry","has-post-title","has-post-date","has-post-category","has-post-tag","has-post-comment","has-post-author",""],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.8 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>White Paper Deep Learning | Kepler Vision Technologies<\/title>\n<meta name=\"description\" content=\"Kepler Vision Technologies develops computer vision based software. Read here about the best practices in a deep learning environment.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/keplervision.eu\/en\/deep-learning\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"White Paper Deep Learning | Kepler Vision Technologies\" \/>\n<meta property=\"og:description\" content=\"Kepler Vision Technologies develops computer vision based software. Read here about the best practices in a deep learning environment.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/keplervision.eu\/en\/deep-learning\/\" \/>\n<meta property=\"og:site_name\" content=\"Kepler Vision Technologies\" \/>\n<meta property=\"article:modified_time\" content=\"2025-03-11T16:21:22+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/keplervision.eu\/wp-content\/uploads\/2019\/04\/0_wuN6nJiONvPVk4E6.jpeg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"803\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"11 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/keplervision.eu\\\/en\\\/deep-learning\\\/\",\"url\":\"https:\\\/\\\/keplervision.eu\\\/en\\\/deep-learning\\\/\",\"name\":\"White Paper Deep Learning | Kepler Vision Technologies\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/keplervision.eu\\\/en\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/keplervision.eu\\\/en\\\/deep-learning\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/keplervision.eu\\\/en\\\/deep-learning\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/keplervision.eu\\\/wp-content\\\/uploads\\\/2019\\\/04\\\/0_wuN6nJiONvPVk4E6-300x201.jpeg\",\"datePublished\":\"2020-02-17T14:37:53+00:00\",\"dateModified\":\"2025-03-11T16:21:22+00:00\",\"description\":\"Kepler Vision Technologies develops computer vision based software. Read here about the best practices in a deep learning environment.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/keplervision.eu\\\/en\\\/deep-learning\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/keplervision.eu\\\/en\\\/deep-learning\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/keplervision.eu\\\/en\\\/deep-learning\\\/#primaryimage\",\"url\":\"https:\\\/\\\/keplervision.eu\\\/wp-content\\\/uploads\\\/2019\\\/04\\\/0_wuN6nJiONvPVk4E6-300x201.jpeg\",\"contentUrl\":\"https:\\\/\\\/keplervision.eu\\\/wp-content\\\/uploads\\\/2019\\\/04\\\/0_wuN6nJiONvPVk4E6-300x201.jpeg\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/keplervision.eu\\\/en\\\/deep-learning\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/keplervision.eu\\\/en\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"White Paper Deep Learning\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/keplervision.eu\\\/en\\\/#website\",\"url\":\"https:\\\/\\\/keplervision.eu\\\/en\\\/\",\"name\":\"Kepler Vision Technologies\",\"description\":\"TRUSTED, PROVEN TECHNOLOGIES SECURING CARE AVAILABILITY\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/keplervision.eu\\\/en\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"White Paper Deep Learning | Kepler Vision Technologies","description":"Kepler Vision Technologies develops computer vision based software. Read here about the best practices in a deep learning environment.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/keplervision.eu\/en\/deep-learning\/","og_locale":"en_US","og_type":"article","og_title":"White Paper Deep Learning | Kepler Vision Technologies","og_description":"Kepler Vision Technologies develops computer vision based software. Read here about the best practices in a deep learning environment.","og_url":"https:\/\/keplervision.eu\/en\/deep-learning\/","og_site_name":"Kepler Vision Technologies","article_modified_time":"2025-03-11T16:21:22+00:00","og_image":[{"width":1200,"height":803,"url":"https:\/\/keplervision.eu\/wp-content\/uploads\/2019\/04\/0_wuN6nJiONvPVk4E6.jpeg","type":"image\/jpeg"}],"twitter_card":"summary_large_image","twitter_misc":{"Est. reading time":"11 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/keplervision.eu\/en\/deep-learning\/","url":"https:\/\/keplervision.eu\/en\/deep-learning\/","name":"White Paper Deep Learning | Kepler Vision Technologies","isPartOf":{"@id":"https:\/\/keplervision.eu\/en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/keplervision.eu\/en\/deep-learning\/#primaryimage"},"image":{"@id":"https:\/\/keplervision.eu\/en\/deep-learning\/#primaryimage"},"thumbnailUrl":"https:\/\/keplervision.eu\/wp-content\/uploads\/2019\/04\/0_wuN6nJiONvPVk4E6-300x201.jpeg","datePublished":"2020-02-17T14:37:53+00:00","dateModified":"2025-03-11T16:21:22+00:00","description":"Kepler Vision Technologies develops computer vision based software. Read here about the best practices in a deep learning environment.","breadcrumb":{"@id":"https:\/\/keplervision.eu\/en\/deep-learning\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/keplervision.eu\/en\/deep-learning\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/keplervision.eu\/en\/deep-learning\/#primaryimage","url":"https:\/\/keplervision.eu\/wp-content\/uploads\/2019\/04\/0_wuN6nJiONvPVk4E6-300x201.jpeg","contentUrl":"https:\/\/keplervision.eu\/wp-content\/uploads\/2019\/04\/0_wuN6nJiONvPVk4E6-300x201.jpeg"},{"@type":"BreadcrumbList","@id":"https:\/\/keplervision.eu\/en\/deep-learning\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/keplervision.eu\/en\/"},{"@type":"ListItem","position":2,"name":"White Paper Deep Learning"}]},{"@type":"WebSite","@id":"https:\/\/keplervision.eu\/en\/#website","url":"https:\/\/keplervision.eu\/en\/","name":"Kepler Vision Technologies","description":"TRUSTED, PROVEN TECHNOLOGIES SECURING CARE AVAILABILITY","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/keplervision.eu\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"}]}},"builder_content":"<h1>White Paper On Best Practices for Software Development in Deep Learning<\/h1> <p>We congratulate our Kepler team member Per John for having his article on \u201cBest Practices for Software Development in a Deep Learning Environment\u201d published on Towards Data Science. Towards Data Science provides a platform for thousands of people to exchange ideas and to expand the understanding of data science.<\/p> <p>For convenience, a copy of the publication is provided below:<\/p> <h3>Software development: best practices in a deep learning environment<\/h3> <p>Deep learning systems are now being used extensively in many environments. They differ from traditional software systems in the manner through which output is generated: decisions which are made to produce results are learned from training data, rather than being hand-coded, as is the case with traditional software systems.<\/p> <p>This has led people to wrongly believe that software development is not as necessary for machine learning systems, since the algorithms, which are at the heart of any substantial system, are learned by the system itself. Furthermore, state-of-the-art machine learning (ML) is advancing so rapidly that data scientists are focusing primarily on short term goals like model precision. Given that the deep learning world will have changed within a matter of months, why waste time properly engineering your system today? Currently, the lie of the land in deep learning engineering is that a system is experimented together until some precision goal is achieved, and then is put into production.<\/p> <p>Even if different from traditional systems, this method of engineering a ML system is pernicious for several reasons:<\/p> <ul> <li style=\"list-style-type: none;\"> <ul> <li>A mistake can easily be made, thereby producing an incorrect precision. For instance, because of an error in data loading or transformation, or the incorrect handling of training\/test\/validation sets.<\/li> <li>Reproducing training results is time-costly since the data and configuration used is often undocumented and not versioned.<\/li> <li>Running the ML model in a production environment is difficult since the system is only tested in a poorly documented developer environment.<\/li> <li>Adapting the ML system to changing requirements is also tricky since maintainability of the code base was not a concern, and the system lacks automated tests.<\/li> <\/ul> <\/li> <\/ul> <p>Given these challenges, how do you engineer deep learning systems whilst avoiding the detrimental effects of throw-away coding and experimentation? In this article, we share the best practices developed and adopted at <a href=\"https:\/\/keplervision.eu\/\">Kepler Vision Technologies<\/a>. These practices are rooted firmly in modern-day software development practices and account for the peculiarities of developing machine learning systems. So, what is special to developing deep learning systems, compared with traditional systems?<\/p> <figure id=\"attachment_3335\" style=\"width: 400px;\" aria-describedby=\"caption-attachment-3335\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/keplervision.eu\/wp-content\/uploads\/2019\/04\/0_wuN6nJiONvPVk4E6-300x201.jpeg\" alt=\"\" width=\"400\" height=\"268\" srcset=\"https:\/\/keplervision.eu\/wp-content\/uploads\/2019\/04\/0_wuN6nJiONvPVk4E6-300x201.jpeg 300w, https:\/\/keplervision.eu\/wp-content\/uploads\/2019\/04\/0_wuN6nJiONvPVk4E6-768x514.jpeg 768w, https:\/\/keplervision.eu\/wp-content\/uploads\/2019\/04\/0_wuN6nJiONvPVk4E6-1024x685.jpeg 1024w, https:\/\/keplervision.eu\/wp-content\/uploads\/2019\/04\/0_wuN6nJiONvPVk4E6.jpeg 1200w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/> <figcaption id=\"caption-attachment-3335\">Photo by Shane Rounce on Unsplash<\/figcaption> <\/figure> <h3>The peculiarities of deep learning engineering<\/h3> <p>In this section we list the features of deep learning engineering that present major obstacles to the adoption of best practices from traditional software engineering. In the section following this section, we present the measures we took to make our deep learning development process more robust and efficient, overcoming (part of) the obstacles listed here.<\/p> <p><strong>1. Rapidly evolving deep learning frameworks<\/strong><br>At the dawn of the deep learning revolution, models were built based on low-level libraries. Sometime later, specialized libraries began appearing to aid ML engineers, most notably with the launch of the Caffe library, founded in 2013. Three years later, a plethora of other deep learning libraries had come into existence. Scrolling forward to today, the number of different libraries continues to increase, though a group of leading front-runners has formed. Besides the birth of new individual libraries, each library itself must constantly be subjected to rapid evolution as they try to keep up with new developments in neural network research.<\/p> <p><strong>2. Data management<\/strong><br>A major characteristic of deep learning systems is that the amount of input data is much larger, as is the size of an individual data item, compared with traditional software systems. This is particularly true during the training phase, in which the amount of data used is often highly substantial. Whilst it is also true that data stores that are managed by traditional software systems can become colossal in size, these systems can still be run and tested by using just a fraction of the given data. Training a deep learning system, on the other hand, can never be successful unless vast amounts of data are utilized.<\/p> <p><strong>3. Configuration management<\/strong><br>Managing configuration data is an intricate art in software engineering, and best practice is that configuration data is versioned in such a way that it is clear which configuration data belongs to which version of the source code. This best practice also holds for machine learning systems, but remains hard to implement, as the size of the configuration data is so large in comparison with traditional systems. After all, the weights of a trained neural network are configuration data and can easily measure up to a total of 1Gb.<\/p> <p>The so-called hyper-parameters used when training a deep learning model are also configuration data and must be managed in kind. As it turns out, these hyper-parameters are often scattered around the code, tuned in such a way that the precision goal can be met.<\/p> <p><strong>4. Testing deep learning systems<\/strong><br>Test automation is now widely used in software development. In the deep learning world, however, test automation is not used to nearly the same extent. The reasons for this are twofold: first, deep learning developers are not properly trained in modern software engineering practices, and second that developing a deep learning model requires a great deal of experimentation.<\/p> <p>Additionally, training a deep learning system is a non-deterministic process. This makes automated testing of the training process more difficult, since simple asserts on training outcomes will not be enough. Take into account the long training times, and it is easy to see why most deep learning engineers forgo automated testing.<\/p> <p>Automated testing of inferencing is more straightforward. Once a model is trained only small amounts of non-determinism remain due to rounding errors.<\/p> <p><strong>5. Deployment of deep learning systems<\/strong><br>Two central features distinguish deployment of deep learning systems from traditional systems: Firstly, most deep learning systems need special hardware, a GPU, to run with enough throughput. Secondly, deep learning systems require a large configuration file, the weight file, to be able to make predictions. This makes achieving dev\/prod parity, that is, a similar environment for development and testing, more difficult. Moreover, it is unwise to package the weights file together with the source code in one deployment package because of its size. A result of this is that you need to deploy weight files and the system using separate artifacts, whilst keeping them in sync.<\/p> <h3>Our approach to engineering deep learning systems<\/h3> <p>Our approach is based on the guiding principle that training and running a deep learning system should be automated to the greatest possible extent. It should not be based on an individual person needing to train and evaluate the model after weeks of experimentation.<\/p> <p><strong>Using continuous integration to build Docker images<\/strong><br>We use Docker to specify the development and run-time environment of our deep learning systems. When we push changes to our version control system\u200a\u2014\u200ain our case GitHub\u200a\u2014\u200aTravis kicks in and runs tests, and if successful, will then build the latest Docker images and push them to the Docker repository. The Docker images will not contain the model, as that would make our Docker images too big to easily distribute.<\/p> <p>Using Docker, the run-time environment is secured together with the code and it is possible to deploy and run the system reliably in any environment supporting Docker, regardless of the deep learning framework used.<\/p> <figure id=\"attachment_3336\" style=\"width: 400px;\" aria-describedby=\"caption-attachment-3336\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/keplervision.eu\/wp-content\/uploads\/2019\/04\/0_cj2-qxaGC368JOxv-300x200.jpeg\" alt=\"\" width=\"400\" height=\"266\" srcset=\"https:\/\/keplervision.eu\/wp-content\/uploads\/2019\/04\/0_cj2-qxaGC368JOxv-300x200.jpeg 300w, https:\/\/keplervision.eu\/wp-content\/uploads\/2019\/04\/0_cj2-qxaGC368JOxv-768x511.jpeg 768w, https:\/\/keplervision.eu\/wp-content\/uploads\/2019\/04\/0_cj2-qxaGC368JOxv-1024x682.jpeg 1024w, https:\/\/keplervision.eu\/wp-content\/uploads\/2019\/04\/0_cj2-qxaGC368JOxv.jpeg 1200w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/> <figcaption id=\"caption-attachment-3336\">Photo by elCarito on Unsplash<\/figcaption> <\/figure> <p><strong>Putting cloud first<\/strong><br>We strive for our models to be trained and evaluated on an instance in the cloud (we use Amazon AWS), since this disentangles the platform details from running our system. The Docker image approach described above, together with Amazon AWS, combine neatly together to achieve this abstraction.<\/p> <p><strong>Storing data in a central location<\/strong><br>Data management is an integral part of deep learning systems. We store data and annotations in a central repository (AWS S3 buckets), and use scripts to sync data locally, such that our applications, jobs and developers have access to the latest data. Jobs always run using clean copies of data, and we deem the time taken in downloading the data well spent; this gives us certainty that the right train and test data is being used.<\/p> <p><strong>Storing weight files in a central location<\/strong><br>We store weight files in a central location and pull them in if needed to start an inferencing server or an evaluation job. The weight files are updated when a training job has produced an improved version of the model. Storing weight files centrally ensures that jobs and developers have easy access to the latest model.<\/p> <p><strong>Storing hyper parameter configuration in a central location<\/strong><br>We store (changes to) hyper parameters used for training and evaluation in a configuration file. This prevents these parameters from becoming scattered around the source code. We use a central location to store these configuration files so that they are accessible to jobs and developers.<\/p> <figure id=\"attachment_3337\" style=\"width: 400px;\" aria-describedby=\"caption-attachment-3337\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/keplervision.eu\/wp-content\/uploads\/2019\/04\/1_zKsgmiQ66tpicgu-ppA5Tg-300x192.png\" alt=\"\" width=\"400\" height=\"256\" srcset=\"https:\/\/keplervision.eu\/wp-content\/uploads\/2019\/04\/1_zKsgmiQ66tpicgu-ppA5Tg-300x192.png 300w, https:\/\/keplervision.eu\/wp-content\/uploads\/2019\/04\/1_zKsgmiQ66tpicgu-ppA5Tg-768x492.png 768w, https:\/\/keplervision.eu\/wp-content\/uploads\/2019\/04\/1_zKsgmiQ66tpicgu-ppA5Tg-1024x656.png 1024w, https:\/\/keplervision.eu\/wp-content\/uploads\/2019\/04\/1_zKsgmiQ66tpicgu-ppA5Tg.png 1194w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/> <figcaption id=\"caption-attachment-3337\">Jenkins training job<\/figcaption> <\/figure> <p><strong>Consolidating training and evaluation into jobs<\/strong><br>We consolidate training and evaluation into jobs using the <a href=\"https:\/\/jenkins.io\/\" target=\"_blank\" rel=\"noopener\">Jenkins automation server.<\/a> The jobs pull in the latest Docker images, as well as the data and annotations, and will run training or evaluation inside a Docker container. The Docker container has volumes attached which point to the data, the annotations and the model file in the case of evaluation.<\/p> <p>After a training job has been completed, a weights file will be produced which can be stored providing it meets the KPIs of the model.<\/p> <p>The job will consolidate all the knowledge and data needed into scripts, and anyone will be able to reliably train or evaluate a model, without needing to understand the intricacies of the model.<\/p> <p>We do however archive the annotations files, configuration files and produced weight files so that the results of a job can be reproduced.<\/p> <p><strong>Using remote development features of PyCharm<\/strong><br>It is more productive when developers can use an integrated development environment (IDE). Sometimes it can be necessary for developers to test and debug the source code and model on a GPU, which is often unavailable directly on their development machine. One way to combine using an IDE while running the code on a GPU is to start a Docker container on a machine with a GPU and then using PyCharm\u2019s features for running a remote interpreter. A remote interpreter significantly reduces round-trip time compared with uploading source code to a remote server and add logging statements for debugging.<\/p> <p><strong>Using test automation<\/strong><br>As explained above, writing effective and efficient automated tests is harder for deep learning systems than for traditional software systems. We do write automated tests for deep learning systems, using the pytest framework, but not as extensively as we would do for traditional software systems.<\/p> <p>We focus on testing data input and on important algorithms. We also write tests for training and evaluation, but only to the extent that the code will run smoothly. We do not check that a full training run produces the correct results, as this would be too time-consuming. Instead, we write an automated test that runs one or two epochs on a very small data set, then we check that output metrics are produced but do not assert their values.<\/p> <p>The result of our test automation effort is that our code base is easier to understand and that changes require less effort to implement.<\/p> <p><strong>Measuring maintainability metrics<\/strong><br>A software system needs to be maintained in order to adapt to changing requirements and environments, and a deep learning system is no exception. The more maintainable a system is, the easier it will be to modify and extend the system. We use bettercodehub.com to check 10 software engineering guidelines and to notify us when a system drifts off course. We prefer this tool over code violation tools since it allows us to focus on the important maintainability guidelines, rather than having to deal with vast amounts of violations which a tool like PyLint will report.<\/p> <p><strong>Conclusion<\/strong><br>Deep learning is a relatively new addition to the IT toolbox of organizations. It is partly rooted in traditional software development and partly in data science. Taking into account that deep learning remains in its infancy, it is unsurprising that standard engineering practices have not yet been established. The development of deep learning software is an integral part of software engineering however, and we must draw inspiration from the principles of modern software engineering as a starting point for improving deep learning software engineering.<\/p> <p>The practices listed above have made our development process for deep learning models more robust and repeatable and have created a common ground for collaboration between deep learning practitioners and software engineers. The process is not perfect, as no process is, and we are continuously searching for ways to improve on our current working methods.<\/p> <p>Thanks to Rob van der Leek.<\/p>","_links":{"self":[{"href":"https:\/\/keplervision.eu\/en\/wp-json\/wp\/v2\/pages\/6575","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/keplervision.eu\/en\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/keplervision.eu\/en\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/keplervision.eu\/en\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/keplervision.eu\/en\/wp-json\/wp\/v2\/comments?post=6575"}],"version-history":[{"count":18,"href":"https:\/\/keplervision.eu\/en\/wp-json\/wp\/v2\/pages\/6575\/revisions"}],"predecessor-version":[{"id":29704,"href":"https:\/\/keplervision.eu\/en\/wp-json\/wp\/v2\/pages\/6575\/revisions\/29704"}],"wp:attachment":[{"href":"https:\/\/keplervision.eu\/en\/wp-json\/wp\/v2\/media?parent=6575"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}