Posts sorted by label

Monday, May 29, 2017

Big Data: Getting information from data

Big Data is all around us. But there are pitfalls in converting this growing avalanche of data into useful information.

Two decades ago, we saw the early introduction of data collection and storage systems into industrial settings. My own experience with these systems, such as the PI data historian system from OSIsoft, was in the pulp and paper industry, but the prevalence of these systems crosses sector boundaries. The purpose was to look at trends in all the different sensor readings, control actions and actuator settings in an industrial process, in order to attempt to draw conclusions from the accumulated machine records that might not be immediately obvious to operators or technical staff. These conclusions could then presumably be used to reduce operating costs or improve product quality.

Growth of these systems tracked the decrease in the cost of computer data storage systems. With memory chips and hard drives being expensive early in the computer era, operating data was often overwritten daily if not hourly. A study I ran in the late 1990's [see reference below] had access to 36 months worth of industrial data, collected daily by a mill engineer and averaged monthly. So 36 data points to describe 3 years of operations ... today, a years' worth of industrial process data from a full-scale plant amounts to terabytes of data, but can be stored forever at a cost of a few dollars.

In parallel with the rise of data historians and archiving systems, we saw the growth of analytical processes to try to make sense of this new avalanche of data. A quick Google search yields an enormous list of textbooks and articles on the topic of data mining and so-called 'statistical learning'; some are even sold by Amazon as well as publishers such as Springer-Verlag or Wiley.
Figure 1: Data from Nature, as used to illustrate correlation versus cause and effect by the late Prof. Martin Weber, Dep't of Chemical Engineering, McGill University, ca. 1990.
The problems that I have run into in a couple of cases are due to the data analyst not properly understanding the process in question, and coming up with recommendations which neglect underlying reasons (technical, economic, product quality, environmental permit constraints, etc.) for operating a particular mill in a particular fashion. In the worst cases, the old issue of correlation versus cause and effect rears its ugly head.
Figure 2: In this correlation, each stork is associated with approximately one live birth per working day (5 days per week, 50 weeks per year). 
In one case, data mining applied to an industrial data set was used to show that the value of a particular variable tended to swing over a very wide range. No good reason could be identified from the data as to why the variable should move around so much. A mild correlation with certain operating inputs (feedstock, water, energy, chemicals) was used to suggest that reducing the variability in this particular variable would help improve overall operating costs.

In fact, the variable in question was manipulated by a control algorithm, with the objective being to maintain minimal product quality requirements necessary to satisfy customers in the face of raw material variability, while using the least amount of raw materials and process inputs necessary to do so. Reduced quality meant lost or highly discounted sales and had a much bigger impact on the bottom line than the slight increase in operating costs required on occasion to maintain product quality in the face of the inevitable disturbances. The quality-related variables were not flagged by the data mining process, since they did not move at all and in fact were uncorrelated with any other variable. This was, in fact, a good thing, as it proved the existence of a well-designed control algorithm: yes, the manipulated variable moves around a lot, but the result is that the controlled variable (product quality) is stable.

So this is an example where faulty information was extracted from big data, and shows the importance of understanding the difference between industrial data (where a controller or an operator may have his or her hand on a valve) versus lab-generated data (which covers every possible combination of all variables, even combinations that will never be implemented industrially). The article referenced above provides another example: mill data showed conclusively that using less bleach led to brighter paper, a correlation which is clearly nonsense on its own, but which makes sense in context.

One thing I have learned from helping to develop process control systems is this: it is very hard to find a good process control person. The best approach is to find a good control person (of whom there are many), and pair him with a good process person (also lots of good people out there). I would suggest the same is true of any of the broad new tools for data analysis (mining, process modeling and integration, etc.): there are lots of analytics people out there, but for best effect, they need to be paired with a process expert who understands the underlying chemistry and physics, as well as the business context.

I have focused here on industrial challenges. Today we are seeing data mining and analysis tools applied to any sphere where lots of data exists. For instance, Google (click here) collects staggering amounts of data every second (including data about who reads this blog, and what else those readers might be interested in), and presumably is building analytical systems to extract information about the world from all this data. (Should we worry that Google knows so much about us? For discussion outside this post...)

So while this post has discussed data in an industrial setting, perhaps it is worth asking if the same challenges exist in the social or other applications of data analysis, and if so, how are they being managed? Big Questions around Big Data.


Reference: Browne, T.C., Miles, K.B., McDonald, J.D. and Wood, J.R., “Multivariate Analysis of Seasonal Pulp Quality Variations in a TMP Mill”, Pulp Pap Can 105(10):35-39 (October 2004).

Wednesday, May 24, 2017

Triage process for selecting bio-economy projects, Part II

I went through some initial triage items last week, covering Technology, Markets and Economics. These are all data-driven and can be established fairly solidly, with numerical values and probabilities. There are some softer, more touchy-feely items that need to be looked at as well:

Partners

Are your partners keen? Do they have cash? Are they known as innovators? Do they have a history of successful partnerships, or do they tend to let the lawyers bog everything down in minutiae? Some very innovative companies also suffer from Not Invented Here syndrome -- outside ideas don't get far. (These tend to be $50 billion companies, with thousands of researchers on staff.)

Be aware of your own blind spots. When you have a hammer, everything looks like a nail; I have known people and organisations who tend to see the world through the lens of their own expertise. The control engineer sees everything as a control problem; the sensor designer thinks all that is needed is a new sensor. In reality you need both sensors and control systems. In that context, are there partners that you should have involved early? Finance, equipment vendors, technology providers, research providers, raw material suppliers and (especially) end-users can all provide critical knowledge and support.

Internal capacity

Do not assume you can do it all on your own! Apart from the partners listed above, do you have the specialised know-how to do this? If not, what is needed: hire a post-doctoral fellow (PDF, which is not an acronym for a type of document file in this context), support a university project, work with a research lab, partner with a specialist?

What about infrastructure: do you have the necessary specialised lab equipment, pilot plant space, etc.? If not, can you rent access (for an example, click here)? At what cost? Include logistics (costs for shipping material, travel costs to witness trials, probability trials will be inconclusive and will need to be repeated, etc.).

Path forward

If you've made it this far, set out a timeline and budget. List potential milestones for a rigorous go/no-go approach, and be prepared to kill it if it starts to falter. There are several milestone techniques out there, the best known of which is the stage-gate process put together by Robert Cooper. You can buy the book and implement it yourself if you don't want to pay for consulting (click here for more info). 

Frogs and toads

That's it! Well, OK, there is more, but this is a good start. Building up a set of tools, perhaps in Excel, is a useful and quick approach to keeping track of the triage process, especially if you update it as you go along. As metrics improve or worsen, you need to be nimble in deciding to stay the course, make some significant changes, or ditch Plan A and move to Plan B or C.

One last pointer to help in your triage efforts. This is a frog:


And this is a toad:


Good luck! Let me know how it goes.

Thursday, May 18, 2017

Triage process for selecting bio-economy projects, Part I

When I worked at FPInnovations as Research Manager for the Biorefinery program, I had a constant dilemma: Lots of good ideas, but never enough resources (time, money, staff) to explore them all.

John Williams, CEO of Domtar, is fond of saying about diversifying into the bio-economy that he's kissing frogs, hoping to find a prince. I think it's a great analogy, but you can kiss an awful lot of frogs and not get anywhere. Ensuring you have a decent selection of frogs to start with is critical. And you want to weed out the toads, which will only give you warts. So how to sort through all the frogs?

As manager, the frogs came at me from all directions. First and foremost, they came from the scientific and engineering staff at FPInnovations who were (and hopefully still are!) very curious, motivated people who are excited about moving new processes and product lines into the existing Canadian forest sector. I could count on at least one idea per hour from these guys. (OK, OK, I'm exaggerating. Maybe one a day.) Then there were the people employed by forest sector companies, who are overworked and who are constantly approached by start-up firms with a great idea that just needs some cash. Finally there were ideas from university professors, government labs and funding agencies, conversations and presentations at conferences, etc. It can all be a bit overwhelming, and I gather from ongoing conversations with people in the field that it remains challenging.

So are you awash in frogs? Worried you might be stuck with a bunch of toads? I thought I'd outline some of the approaches I developed in partnership with colleagues. Initially the point of view was that of a not-for-profit research institute, but I have tried to rewrite it here so it could also be used by a government agency, a for-profit industrial player, or anyone else interested in evaluating which frogs to kiss.

The overall approach, which led to the LignoForce lignin extraction plant at West Fraser's mill in Hinton, Alberta, involves picking a Plan A (selected by triaging a broader set of ideas), then focusing on delivering. A couple of backup plans (Plan B and C) should be identified in case Plan A falls through, but should only take a small portion of the overall effort. An initial list of criteria for the triage process follows. Note that not all points need to be addressed in detail for all projects, but you should skim through to make sure there isn't a deal-breaker lurking in there somewhere.

Fuels versus value-added 

In my view, the first step is to separate bio-fuels and bio-energy projects from pathways leading to platform chemicals, materials or intermediates that are presently made from petroleum. Hauling wood out of the bush and turning it all into fuels, without a concurrent value-added pathway, is only really economic in very narrow circumstances or in the presence of the appropriate politically-supported carrots (renewable fuel standards) or sticks (carbon taxes). As a result, there are additional criteria for these projects which I will not get into here.

Technology

Do the claims seem reasonable from a scientific basis? Anything that appears to violate laws of thermodynamics or conservation of mass should be looked at with great skepticism. The same goes for paths that seem to ignore the basic chemistry of wood (or whatever your bio-based feedstock is).

Estimate the yield (kg of product per dry tonne of wood consumed). Equally important is what happens to the yield losses (residues). These numbers are essential for subsequent steps.

Evaluate the patent landscape. Are there existing patents out there? Can a deal be made to license the patents from the owners, and at what cost? If we are free to develop and operate this proposed process without infringing someone's patent, might there be an opportunity to build a patent position that would protect all this and give us an advantage? At what cost?

Markets

Most importantly, who would want this? Pushing a new product into the market is like pushing on a rope; far better to have some serious market pull.

Are you replacing an existing product or are you proposing something which is completely new to the world? In the first case you need to worry about incumbents; in the second, it is a major challenge to convince someone with an existing profitable product line that he needs your new material.

If you are proposing some form of drop-in replacement, what is the likely product quality compared with the incumbent? Is it better, the same, worse?

Assuming product quality is decent, what is the likely market in terms of tonnes (NAFTA, world-wide), and at what typical list prices? Given transportation costs and discounts you may need to offer to volume buyers, what is the likely mill-gate price? Will a further discount be needed to account for poorer (or different) product performance characteristics? If so, how big a discount?

At full scale, what percentage of this market would a new plant occupy? If the answer is a large number, you will need to consider what existing players might do to protect their turf (lower their prices, for instance, to drive you out of business). Sneaking into the market with capacity of 0.5% of world demand is safer.

Understand the incumbents: They have a lot to lose and may have sneaky ways to keep you out. Alternatively, in a market with several players, one may be interested in partnering as a way of keeping ahead of the competition. Monopoly markets have their own challenges.

Economics

Given the yields and process details, what are the likely operating costs (chemicals, energy, wood) per tonne of product? Start with variable costs; you'll need to consider fixed costs eventually, but if it doesn't work with your variable costs, no point in digging deeper. 

What are maximum possible gross revenues at steady-state and full-scale? Can you take a crude first pass at capital costs? From this, a crude first pass at internal rate of return (IRR) or Return on Capital Employed (ROCE) can provide some guidance. If it is poor even in an optimistic framework, can the technology be improved to be more effective? Sensitivity analysis to the major costs will show where opportunities might exist. Eventually a pro-forma showing revenue ramp-up over several years will give a more accurate estimate of payback time.

To be continued...

So far I have covered items where data analysis and research can lead to relatively hard numbers, with reasonably well-understood probabilities and risk factors. Next week I'll outline some approaches to risk factors which can't be so easily quantified in numerical terms, but which are equally important. Stay tuned! 

Thursday, May 11, 2017

1st International Forest Biorefinery Conference

I attended the 1st International Forest Biorefinery Conference (IFBC), held in Thunder Bay, Ontario on May 9-10, 2017.

I've been to Thunder Bay regularly from about 2009 until my retirement in 2016, as the pilot lignin extraction facilities that I developed as a manager at FPInnovations were installed there and I had up to six employees onsite. The city is big enough, at over 100,000 people, that there is a critical mass of university and college campuses, government offices and labs, and industrial capacity to get things done, but small enough that everyone knows everyone else. So it was nice heading back to see old colleagues and partners.

It was also nice to see that this conference, organised by Lakehead University's Biorefining Research Institute, featured a broad selection of solid papers, not just from Ontario-based Ph.D. students and professors but also research institutes from Sweden, Finland, Belgium and the US, to name a few. Close to 150 attendees, including close to 50 presenters, made the trek to Thunder Bay. A few highlights follow.

Opening plenary sessions included reviews of the LignoBoost (Per Tomani of RISE Bioeconomy) and LignoForce technologies (Mike Paleologou of FPInnovations). In a later session, Kirsten Maki, also of FPInnovations, described the LignoForce pilot plant in Thunder Bay and installation of the full-scale plant at the West Fraser mill in Hinton, Alberta. I won't get into the details as I am biased -- the LignoForce system, developed by Mike and scaled up by Kirsten when they worked for me, is clearly better, but don't let that influence your decision as to which one to buy for your mill. (See my report on the 7th NWBC, here, for a description of the LignoBoost pilot at Backhammar, Sweden).

Michel Jean of Domtar described the company's move to novel products. Paper, specifically uncoated freesheet where they are a market leader, currently represents 50% of sales, but this is declining at 3% to 5% per year. He mentioned the need to move slowly into novel bio-products or risk failure, and stressed the importance of understanding the markets when doing so. That being said, they have four projects on the go, having triaged a much wider set of several hundred ideas:
  • Cellulose nano-filaments made in the CelluForce plant at Windsor, Quebec. This joint venture with FPInnovations now has added investments from Fibria and Schlumberger. 
  • Lignin, dubbed Bio-Choice lignin, from the LignoBoost plant at Plymouth, North Carolina. 
  • The so-called 'super pulp' cellulose filament additive made in Dryden, Ontario.
  • Compounding of lignin with commodity thermoplastics at Espanola, Ontario. 
Interestingly, while 75% of Domtar's manufacturing capacity is in the US, three of the four bio-products plants are in Canada. It is also nice to see a company of Domtar's size prepared to invest at this level, and one can only hope it pays off, if only to prod their competitors to boost their own spending on innovation.

Alan Smith, Director of Business Development at Avantium, described fast-moving developments in this growing company. The company has spun out its core YXY technology, for converting fructose to PEF, into a joint venture with BASF called Synvina. They have also developed a proprietary wood to sugars platform based on patented improvements to the classic high-acid, low temperature process exemplified by the old HCl CleanTech process and many others. The improvements are said to cover acid/sugar separation, material construction, and lignin de-acidification. (If getting sulfur out of lignin from kraft mills is important, I can only assume getting the chlorine out of lignin from a hydrochloric acid process will be no less so.) The process generates three streams (a C5/C6 sugar stream from the hemicellulose portion of wood, a glucose stream from the cellulose, and a sugar-free lignin). All three must be sold; this will be a recurring theme in the world of wood-to-sugars processes. He commented that the cost basis for the glucose will depend on the market value of the other products, which I assume means that the glucose will only be profitable if enough revenue is obtained from the other two streams. This will also be a recurring theme in this space. There are plans for an eventual plant consuming 300,000 to 400,000 dry tonnes of wood per year.

Finally, Avantium is working on a sugar to bio-MEG (monoethylene glycol) pathway which would allow sugar to replace both components in plastic bottles. The pathway is said to be much cheaper than traditional bio-MEG routes, and competitive with petroleum-based MEG. Their partnerships with customer-facing companies like Coca-Cola will ensure that the techno-economic analyses will be thorough. This is one to watch.

On the biofuels front, Jack Saddler of UBC covered pathways to biojet in the plenary session. Later, two entire sessions went into greater detail on various biofuels pathways. I won't cover these here; Jack admitted that kerosene is cheap and bio-jet only works, economically, because there are policy and other non-business drivers that overcome the poor economics. My feeling continues to be that wood in particular is too expensive to make into fuels, and that value-added products must be the route forward when wood is the feedstock. Fuels will come from any left-overs, not the reverse. And since the value-added pathways are more challenging, both technically and economically, this is where the effort needs to be.

A number of academic presenters, PhD students or their supervisors, described early-stage bio-chemicals and bio-materials pathways variously involving glycerol, pyrolysis oils, bio-carbon, PHAs and other intermediates. It is hard to say at this point which ones will do well, because success depends as much on luck or marketing approaches as on technical excellence. (The old folks among you will recall the VHS versus Beta battles.)

One thing is clear: pathways to aromatics remain critical if wood-to-sugars pathways are to be economically viable. Ludo Diels of VITO in Belgium described pathways from sugars (furans via glucose, or furfuryl alcohol via xylose) and from lignin. Low reactivity, high molecular weights and high polydispersity of lignin when compared to petroleum-based aromatics remain problems, according to Diels. An interesting way of looking at different molecules is on a plot of percent oxygen content versus percent hydrogen content, as proposed by Thomas Farmer [1]: petrochemical molecules are all along the x-axis (essentially no oxygen) while lignin and cellulose are to the left and up (lower hydrogen content, but more oxygen). In between are a range of oxygenated petrochemicals, for instance polyethylene terephthalate, (C10H8O4)n. The length and complexity of the track followed by various transformation processes from the proposed feedstock (petrochemical or biomass) to the proposed end product on this graph is an indicator of the difficulty of the process in terms of hydrogenation or de-oxygenation. Given this, going all the way from lignin to one of the BTX molecules is probably not necessary (or feasible), especially if you  are going to re-oxygenate to PET, so intermediate lignin products with new functionalities will be critical.

On the policy and analysis side, Cooper Robinson of Cap-Op Energy described the processes for getting Renewable Fuel Credits (RFS2) and for certifying a fuel under California's Low Carbon Fuel Standard (LCFS). These are here to stay, according to him, but the animosity of the current US government to the EPA may be a threat, at least to the RFS program. Other issues include how to allocate GHG emissions in the case of multiple products, where not all are fuels, and how to get credit for a bio-chemical that may displace a GHG-intensive petroleum-based pathway to an identical or similar molecule. This is a complex space where money can be made if the right accounting procedures are put in place.

Peter Milley, of Halifax Global and a graduate student at Queen's University in Kingston, Ontario, described policy issues in the context of his PhD thesis, which is related to commercially viable pathways to a forest bioeconomy. The Canadian track record is not pretty, with a range of relatively uncoordinated approaches, applied reactively rather than proactively and as part of a long-term strategic plan, and with little in the way of follow-up once deadlines expire. He offered the Finnish national bioeconomy strategy (click here) as an example worth reading, along with reports from the OECD (here) and EU (here). That being said, I would argue the Canadian approach has been far more effective than the large grants from the US Department of Energy; Canadian funding has generally gone to successful projects and has not gotten sucked into quagmires such as the KiOR or Range Fuels disasters. As a result, progress has been slow but has tended to generate better results per dollar of taxpayer money than in the US.

Unfortunately, the organisers had similar sessions scheduled concurrently and in parallel rather than sequentially, so that attendees were forced to choose which bio-fuels session, or which policy session, to attend. So I missed an entire policy session with some very interesting papers, as well as some biorefinery talks I would have liked to see. Hopefully this will be changed in future events. Apart from this quibble, the quality of the presentations and the breadth of expertise in the audience was a very nice surprise given the location, and I am hoping there will be a second edition in a year or 18 months from now.

Were you there? Do your recollections and analysis agree with mine, or do you have a different viewpoint? Did you see interesting presentations that I did not discuss? Drop me a note using the Comments box (for public use) or by e-mail (if you want your comments kept private): Tom (at) TCBrowne.ca.

References:
[1] T.J.Farmer, M.Mascal, Chapter 4: Platform Molecules, in Introduction to Chemicals from Biomass 2nd Ed., Wiley, 2014.