ArchiveOrangemail archive

The Haskell Cafe


haskell-cafe.haskell.org
(List home) (Recent threads) (38 other Haskell lists)

Subscription Options

  • RSS or Atom: Read-only subscription using a browser or aggregator. This is the recommended way if you don't need to send messages to the list. You can learn more about feed syndication and clients here.
  • Conventional: All messages are delivered to your mail address, and you can reply. To subscribe, send an email to the list's subscribe address with "subscribe" in the subject line, or visit the list's homepage here.
  • Moderate traffic list: up to 30 messages per day
  • This list contains about 107,356 messages, beginning Oct 2000
  • 14 messages added yesterday
Report the Spam
This button sends a spam report to the moderator. Please use it sparingly. For other removal requests, read this.
Are you sure? yes no

Re: [Haskell-cafe] Can Haskell outperform C++?

Ad
Roman Werpachowski 1337249314Thu, 17 May 2012 10:08:34 +0000 (UTC)
> From: "Richard O'Keefe" 
> Subject: Re: [Haskell-cafe] Can Haskell outperform C++?
> To: Haskell Cafe 
> Message-ID: 
> Content-Type: text/plain; charset=us-ascii
>
> On 17/05/2012, at 2:04 PM, Gregg Lebovitz wrote:
>
>> Richard,
>>
>> Thank you. This is an example of what I had in mind when I talked about changing the playing field. Do you have a slide deck for this lecture that you would be willing to share with me? I am very interested in learning more.
>
>
> No slide deck required.  The task is "generating alternating permutations".
>
> Method 1: generate permutations using a backtracking search;
>          when a permutation is generated, check if it is alternating.
>
> Method 2: use the same backtracking search, but only allow extensions
>          that preserve the property of being an alternating permutation.Gregg,

what makes Method 2 so much harder than Method 1 to implement in C or C++?

Regards,
RW
Gregg Lebovitz 1337257519Thu, 17 May 2012 12:25:19 +0000 (UTC)
Roman,

I think this question is for Richard. I haven't had a chance to play 
with these methods. I will try to do that today.

GreggOn 5/17/2012 6:07 AM, Roman Werpachowski wrote:
>> From: "Richard O'Keefe"
>> Subject: Re: [Haskell-cafe] Can Haskell outperform C++?
>> To: Haskell Cafe<
>> Message-ID:
>> Content-Type: text/plain; charset=us-ascii
>>
>> On 17/05/2012, at 2:04 PM, Gregg Lebovitz wrote:
>>
>>> Richard,
>>>
>>> Thank you. This is an example of what I had in mind when I talked about changing the playing field. Do you have a slide deck for this lecture that you would be willing to share with me? I am very interested in learning more.
>>
>> No slide deck required.  The task is "generating alternating permutations".
>>
>> Method 1: generate permutations using a backtracking search;
>>           when a permutation is generated, check if it is alternating.
>>
>> Method 2: use the same backtracking search, but only allow extensions
>>           that preserve the property of being an alternating permutation.
> Gregg,
>
> what makes Method 2 so much harder than Method 1 to implement in C or C++?
>
> Regards,
> RW
>
> _______________________________________________
> Haskell-Cafe mailing list
> 
> http://www.haskell.org/mailman/listinfo/haske...
Richard O'Keefe 1337311863Fri, 18 May 2012 03:31:03 +0000 (UTC)
On 17/05/2012, at 10:07 PM, Roman Werpachowski wrote:
>> No slide deck required.  The task is "generating alternating permutations".
>> 
>> Method 1: generate permutations using a backtracking search;
>>          when a permutation is generated, check if it is alternating.
>> 
>> Method 2: use the same backtracking search, but only allow extensions
>>          that preserve the property of being an alternating permutation.
> 
> Gregg,
> 
> what makes Method 2 so much harder than Method 1 to implement in C or C++?It was me, not Gregg.  There was and is no claim that method 2 is "much harder
to implement in C or C++".  In fact both methods *were* implemented easily in C.
The claim was and remains solely that
THE TIME DIFFERENCE BETWEEN *ALGORITHMS*
   can be bigger than
THE TIME DIFFERENCE BETWEEN *LANGUAGES*
   and is in this particular case.

(And that's despite the fact that the C version kept the set of unused
elements as a one-native-word bit mask, while the Prolog version kept it
as a linked list.) 

There is also a second claim, which I thought was uncontentious:
the relative difficulty of tasks varies with language.
Roman Werpachowski 1337341567Fri, 18 May 2012 11:46:07 +0000 (UTC)
> Date: Fri, 18 May 2012 15:30:09 +1200
> From: "Richard O'Keefe" 
> Subject: Re: [Haskell-cafe] Can Haskell outperform C++?
> To: Roman Werpachowski 
> Cc: 
> Message-ID: 
> Content-Type: text/plain; charset=us-ascii
>
>
> On 17/05/2012, at 10:07 PM, Roman Werpachowski wrote:
>>> No slide deck required.  The task is "generating alternating permutations".
>>>
>>> Method 1: generate permutations using a backtracking search;
>>>          when a permutation is generated, check if it is alternating.
>>>
>>> Method 2: use the same backtracking search, but only allow extensions
>>>          that preserve the property of being an alternating permutation.
>>
>> Gregg,
>>
>> what makes Method 2 so much harder than Method 1 to implement in C or C++?
>
>
> It was me, not Gregg.My apologies to you and Gregg.> There was and is no claim that method 2 is "much harder
> to implement in C or C++".  In fact both methods *were* implemented easily in C.OK, got that now. So Haskell doesn't have a *big* advantage over C w/r
to the ease of implementation of both algorithms?> The claim was and remains solely that
> THE TIME DIFFERENCE BETWEEN *ALGORITHMS*
>   can be bigger than
> THE TIME DIFFERENCE BETWEEN *LANGUAGES*
>   and is in this particular case.Yes, but aren't the differences in the same ballpark, and if we want
to compare *languages*, we should use identical algorithms to make the
comparison fair.>
> (And that's despite the fact that the C version kept the set of unused
> elements as a one-native-word bit mask, while the Prolog version kept it
> as a linked list.)
>
> There is also a second claim, which I thought was uncontentious:
> the relative difficulty of tasks varies with language.It matters much less if you write the code to be run multiple times.

Regards,
RW
ok1337359154Fri, 18 May 2012 16:39:14 +0000 (UTC)
>> There was and is no claim that method 2 is "much harder
>> to implement in C or C++".  In fact both methods *were* implemented
>> easily in C.
>
> OK, got that now. So Haskell doesn't have a *big* advantage over C w/r
> to the ease of implementation of both algorithms?In the case of these specific algorithms, no.
In the case of other backtracking algorithms, it could have
quite a big advantage.>
>> The claim was and remains solely that
>> THE TIME DIFFERENCE BETWEEN *ALGORITHMS*
>>   can be bigger than
>> THE TIME DIFFERENCE BETWEEN *LANGUAGES*
>>   and is in this particular case.
>
> Yes, but aren't the differences in the same ballpark,(a) only to the degree that you are willing to call
all language differences "in the same ballpark".
(b) no, because the speed difference between the algorithms
grows without bound as the problem size increases, while
the speed difference between the languages does not.

Here's another example:  the UNIX 'tsort' command.
I have versions in Java, Smalltalk, and C.  The Java and
Smalltalk versions are about the same speed, linear in
the size of the graph.  The C version appears to be the
original UNIX code, and it is *quadratic* in the number
of nodes.  Result:  Smalltalk running faster than C by
a ratio that increases without bound as the input grows.
This is actually a case where it was easier to write fast
code than slow code in Smalltalk, easier to write slow code
than fast code in C.  (Built in hash tables, what else?)> and if we want
> to compare *languages*, we should use identical algorithms to make the
> comparison fair.In the permutation generation example, I was talking about
four programs:
           Language X  Language Y
Method 1   Program X1  Program Y1   -- identical algorithms
Method 2   Program X2  Program Y2   -- identical algorithms

However, "identical" isn't clearly defined.
For example, is a Java program that uses HashMap<String,Integer>
-- and thus allocates millions of Integer objects --
"identical" to a Smalltalk program using Dictionary
-- where the integers fit into the immediate range, so that
no integer boxes are needed at all -- ?
In the 'tsort' case, it turns out that the Java and Smalltalk
versions are I/O bound with over 90% of the time spent just
reading the data.  They have I/O libraries with very different
structures, so what does "identical algorithms" mean?  If you
are using dictionaries/hashmaps, and the two languages have
implementations that compute different hash functions for strings,
is _that_ using the same implementation?  (Some hash functions
are amazingly bad, see Andres Valloud's book.)>> There is also a second claim, which I thought was uncontentious:
>> the relative difficulty of tasks varies with language.
>
> It matters much less if you write the code to be run multiple times.You misunderstand.  The issue is whether you bother to write the
more efficient code *at all*.  The tsort code in C I was looking
at was real production code from a UNIX system that is still on
the market, and in the 20 or so years since it was written, nobody
had ever bothered to fix the blatant efficiency bug.  In other
languages, it would have been easier to write the program without
the bug in the first place.
Isaac Gouy 1337363545Fri, 18 May 2012 17:52:25 +0000 (UTC)
----- Original Message -----
> From: "" 
> Sent: Friday, May 18, 2012 9:38 AM
> Subject: Re: [Haskell-cafe] Can Haskell outperform C++?----- Original Message -----
> From: "" 
> Sent: Friday, May 18, 2012 9:38 AM
> Subject: Re: [Haskell-cafe] Can Haskell outperform C++?

-snip-
>>  and if we want
>>  to compare *languages*, we should use identical algorithms to make the
>>  comparison fair.
> 
> In the permutation generation example, I was talking about
> four programs:
>            Language X  Language Y
> Method 1   Program X1  Program Y1   -- identical algorithms
> Method 2   Program X2  Program Y2   -- identical algorithms
> 
> However, "identical" isn't clearly defined.Moreover, being absolutely sure that the algorithms are in some sense 
"identical" might make comparison pointless - for example, when the same assembly 
is generated by gcc from a C program and gnat from an Ada program.


-snip-> In the 'tsort' case, it turns out that the Java and Smalltalk
> versions are I/O bound with over 90% of the time spent just
> reading the data.My guess is that they could be written to do better than that - but it's 
idiotic of me to say so without understanding the specifics, please 
forgive me ;-)> They have I/O libraries with very different
> structures, so what does "identical algorithms" mean?  If you
> are using dictionaries/hashmaps, and the two languages have
> implementations that compute different hash functions for strings,
> is _that_ using the same implementation?Of course, to some degree, user defined hash functions remedy that specific problem.


I agree with the thrust of your comments - even programming languages (and implementations) that seem similar, are often so different (when we get down to specifics) that comparison between programs written in different languages is a matter of making the best of a bad job.

But we're still going to ask - Will my program be faster if I write it in language X? - and we're 
still going to wish for a simpler answer than - It depends how you write it!
Paulo Pocinho 1337381650Fri, 18 May 2012 22:54:10 +0000 (UTC)
I've been following the topic in both threads. Very nice discussion.On 18 May 2012 18:51, Isaac Gouy  wrote:
>
> Moreover, being absolutely sure that the algorithms are in some sense
> "identical" might make comparison pointless - for example, when the same assembly
> is generated by gcc from a C program and gnat from an Ada program.Suppose we take the most efficient algorithm, in terms of time and
space. The demonstrations given before this post indicate the language
can be more important than the algorithm. There are two different
takes on the situation.
On one hand we have time and space. On the other, man-hours and
reliability. Both languages deal with this ratio.> But we're still going to ask - Will my program be faster if I write it in language X? - and we're
> still going to wish for a simpler answer than - It depends how you write it!Change that question to: Will my language be faster if I write program X?

Good C++ implementations are easy to find. However, everything is
exposed. De-bugging is taken for granted. Unless everyone in a team
follows some strict practice, productivity may be at risk. It is more
difficult to provide reliability (i.e. programs as proofs [1]). It is
everywhere - that only increases probability to find more optimized
algorithms in C++ libraries.

The Haskell implementation enforces an interface to the programmer
(the low level time-space management, i.e. GC, IO monads,
parallelism). Improvements in implementation increase efficiency of
the program (i.e. optimization in standard libraries [2]). Risk of
damaging productivity is low.

Taking that you could derive one efficient algorithm down to assembly,
the fastest language is a matter of implementation. Then, in equal
grounds, it's rather easy choice.--
[1] http://homepages.inf.ed.ac.uk/wadler/papers/f...
[2] https://groups.google.com/d/msg/haskell-cafe/...
Richard O'Keefe 1337553714Sun, 20 May 2012 22:41:54 +0000 (UTC)
On 19/05/2012, at 5:51 AM, Isaac Gouy wrote:
>> In the 'tsort' case, it turns out that the Java and Smalltalk
>> versions are I/O bound with over 90% of the time spent just
>> reading the data. 
> 
> My guess is that they could be written to do better than that - but it's 
> idiotic of me to say so without understanding the specifics, please 
> forgive me ;-)Actually, I/O bound is *good*.

Here's the times from the C version, which has been hacked hard in order
to be as fast as I could make it.

 N     total      input      process    output
 1000; 0.004618 = 0.004107 + 0.000438 + 0.000073
 2000; 0.014467 = 0.012722 + 0.001609 + 0.000136
 5000; 0.059810 = 0.051308 + 0.008199 + 0.000303
10000; 0.204111 = 0.150638 + 0.052800 + 0.000673
20000; 0.717362 = 0.518343 + 0.197655 + 0.001364
50000; 3.517340 = 2.628550 + 0.885456 + 0.003331

N here is the number of nodes in the graph;
the number of edges is floor(N**1.5 * 0.75).
Input is the read-word + look up in hash table time.
Process is the compute-the-transitive-closure time.
Output is the time to write the node names in order.
All node names had the form x## with ## being 1..10000.
This is with my own low level code; using scanf(%...s)
pushed the input time up by 40%.

The Mac OS X version of the tsort command took
31.65 CPU seconds for N=10,000, of which
28.74 CPU seconds was 'system'.Like I said, the languages I used in this test
>> ... have I/O libraries with very different
>> structures, so what does "identical algorithms" mean?  If you
>> are using dictionaries/hashmaps, and the two languages have
>> implementations that compute different hash functions for strings,
>> is _that_ using the same implementation? 
> 
> Of course, to some degree, user defined hash functions remedy that specific problem.While creating other, and perhaps worse, ones.

For example, in the Smalltalk code, if you use a Dictionary of Strings,
you're getting Robert Jenkin's hash function in optimised C.  If you
supply your own, you're getting a very probably worse hash function
and it's going to run rather slower.  And above all, the stuff you are
benchmarking is no longer code that people are actually likely to write.> But we're still going to ask - Will my program be faster if I write it in language X? - and we're 
> still going to wish for a simpler answer than - It depends how you write it!Here's another little example.  I had a use for the Singular Value Decomposition
in a Java program.  Should I use pure Java or native C?

Pure Java taken straight off the CD-ROM that came with a large
book of numerical algorithms in Java:   T seconds.

After noticing that the code was just warmed over Fortran, and was
varying the leftmost subscript fastest (which is good for Fortran,
bad for most other languages) and swapping all the matrix dimensions: T/2 seconds.

After rewriting in C:  T/4 seconds.

After rewriting the C code to call the appropriate BLAS
and thereby using tuned code for the hardware, T/7 seconds.

Since this was going to take hundreds of seconds per run, the answer was easy.

A simple little thing like using a[i][j] vs a[j][i] made a
factor of 2 difference to the overall speed.

"It depends" is the second best answer we can get.
The best answer is one that says _what_ it depends on.
Isaac Gouy 1337617000Mon, 21 May 2012 16:16:40 +0000 (UTC)
> From: Richard O'Keefe 
> Sent: Sunday, May 20, 2012 3:41 PM

> On 19/05/2012, at 5:51 AM, Isaac Gouy wrote:
>>> In the 'tsort' case, it turns out that the Java and Smalltalk
>>> versions are I/O bound with over 90% of the time spent just
>>> reading the data.
>>
>> My guess is that they could be written to do better than that - but
> it's
>> idiotic of me to say so without understanding the specifics, please
>> forgive me ;-)
>
> Actually, I/O bound is *good*.Why would that be good or bad?

I suppose you're just suggesting that, in this case, the performance characteristics of the Java and Smalltalk programs are similar to the C program; but, for whatever reason, you're leaving us guessing about the timings for those other programs.


-snip->> Of course, to some degree, user defined hash functions remedy that specific problem.
>
> While creating other, and perhaps worse, ones.
>
> For example, in the Smalltalk code, if you use a Dictionary of Strings,
> you're getting Robert Jenkin's hash function in optimised C.  If you
> supply your own, you're getting a very probably worse hash function
> and it's going to run rather slower.  And above all, the stuff you are
> benchmarking is no longer code that people are actually likely to write.I think you're being a touch quarrelsome :-)

Do *all* Smalltalk language implementations provide that same hash function in optimised C?

Have Smalltalk language implementations *always* provided that same hash function in optimised C?

How might that hash function be used when the (not necessarily Smalltalk) language implementation does not provide it?

Have you researched what code people are actually likely to write, or are you just speculating? ;-)



-snip->>  But we're still going to ask - Will my program be faster if I write it 
>>  in language X? - and we're still going to wish for a simpler answer
>> than - It depends how you write it!
> 
> Here's another little example.  I had a use for the Singular Value 
> Decomposition in a Java program.  Should I use pure Java or native C?
> 
> Pure Java taken straight off the CD-ROM that came with a large
> book of numerical algorithms in Java:   T seconds.
> 
> After noticing that the code was just warmed over Fortran, and was
> varying the leftmost subscript fastest (which is good for Fortran,
> bad for most other languages) and swapping all the matrix dimensions: T/2 
> seconds.
> 
> After rewriting in C:  T/4 seconds.
> 
> After rewriting the C code to call the appropriate BLAS
> and thereby using tuned code for the hardware, T/7 seconds.
> 
> Since this was going to take hundreds of seconds per run, the answer was easy.Maybe the more interesting question was - Should I use a scripting language + BLAS.> "It depends" is the second best answer we can get.
> The best answer is one that says _what_ it depends on.That may be the best answer to some other question - but for the stated question I think were looking for a Yes or a No :-)
Richard O'Keefe 1337651740Tue, 22 May 2012 01:55:40 +0000 (UTC)
On 22/05/2012, at 4:15 AM, Isaac Gouy wrote:
>> Actually, I/O bound is *good*.
> 
> Why would that be good or bad?The context here is a UNIX-style topological sorting program.
Being I/O bound means that the program is limited by how fast
it can read the data.  If 90% of the time goes into reading
the data, that means that the *algorithmic* part is fast enough.

There may be very little one can do about the I/O part.

> I suppose you're just suggesting that, in this case, the performance characteristics of the Java and Smalltalk programs are similar to the C program; but, for whatever reason, you're leaving us guessing about the timings for those other programs.

I didn't _think_ I'd omitted anything important.  Oh well.

    For 25,000 nodes and 2,989,635 edges,
	Java (first version) 4.83 seconds   -- uses ArrayList, HashMap
	Java (2nd version)   4.17 seconds   -- uses own IntList
	Java (3rd version)   4.09 seconds   -- different approach
        Smalltalk            4.40 seconds
	C                    0.92 seconds   -- my code
        Mac OS X tsort(1)  150.35 seconds   -- 11.84 user + 138.51 system

    For 50,000 nodes and 8,385,254 edges,
	Java (first version) ran out of memory after 89.54 seconds (default heap)
	Java (2nd version)  13.31 seconds   -- avoid Integer boxing!
        Java (3rd version)  15.07 seconds
        Smalltalk           14.52 seconds
        C                    2.36 seconds
	Mac OS X tsort(1)  482.32 seconds   -- 41.55 user + 440.77 system

The Mac OS X tsort(1) is presumably written in C.  Symbols have been
stripped, so even with Instruments I can't see where the time is going.
Judging from its memory behaviour, I believe that most of the time is
going in the input phase, which suggests a spectacularly inefficient
symbol table, which suggests that it might be using a not-very-modified
version of the Unix V7 code, which used a linear search...>>> Of course, to some degree, user defined hash functions remedy that specific problem.
>> 
>> While creating other, and perhaps worse, ones.
>> 
>> For example, in the Smalltalk code, if you use a Dictionary of Strings,
>> you're getting Robert Jenkin's hash function in optimised C.  If you
>> supply your own, you're getting a very probably worse hash function
>> and it's going to run rather slower.  And above all, the stuff you are
>> benchmarking is no longer code that people are actually likely to write.
> 
> I think you're being a touch quarrelsome :-)That upset me.  All I was saying is that surely the only *point* of
talking about the performance of *languages* is to get some idea of
how well programs are likely to do, and not any old specially crafted
code, but the kind of code that is likely to be written for purposes
other than benchmarking.  So the more you bash on a particular example
to get the time down, the less representative it is of the kind of code
people are likely to write *for purposes other than benchmarking*.

Just today I marked a student program where their program took 15 minutes
and my model answer took a touch under 4 milliseconds.  I explained to
them _that_ their program was spectacularly slow.  They already knew _why_
it was.  I also explained the one trick (lazy initialisation) that could
have hugely improved the time.  I also told them that they had made the
right decision, to optimise *their* time, not the computer's, in their
particular context.

> Do *all* Smalltalk language implementations provide that same hash function in optimised C?

*That* function, no.  *Some* hash function as a "primitive" (meaning
optimised C), well, I don't have every Smalltalk, but the ones I do
have, I've checked, and yes they do.> 
> Have Smalltalk language implementations *always* provided that same hash function in optimised C?Obviously not: Smalltalk-80 ran on Xerox D-machines, which didn't *have*
C.  "Primitives" were implemented in microcode.> 
> Have you researched what code people are actually likely to write, or are you just speculating? ;-)I'm in my 6th decade.  I've seen a lot of code in a lot of languages.> 
>> "It depends" is the second best answer we can get.
>> The best answer is one that says _what_ it depends on.
> 
> That may be the best answer to some other question - but for the stated question I think were looking for a Yes or a No :-)The subject line has the obvious and boring answer "yes, of course".

I recall one time when I wanted to analyse some data and typed up
Fortran code for a suitable technique from a book.  It didn't work.
After struggling to debug it for a while, I implemented the whole
thing from scratch in Prolog.  Then I went back to the Fortran
code, spotted my typing mistake, and fixed it.  Result, two working
programs.  The Prolog program (compiled to a VM which was emulated;
the compiler was not very clever) ran faster than the Fortran program
(compiled to native code by a seriously optimising compiler from a
major computer manufacturer).

Reason?  Same algorithm, different data structure.  The data structure
was one that was easy to express in Prolog (or Haskell!) but would
have been insanely difficult in Fortran 77.  This century's Fortran is
of course another matter.

The point here of course is that there are data structures and algorithms
that are easy to express in some languages but hard in others, so that
in code written *for purposes other than benchmarking*, they just would
not be _used_ in one of those languages.
Isaac Gouy 1337705739Tue, 22 May 2012 16:55:39 +0000 (UTC)
> From: Richard O'Keefe
> Sent: Monday, May 21, 2012 6:54 PM

> On 22/05/2012, at 4:15 AM, Isaac Gouy wrote:
>>>  Actually, I/O bound is *good*.
>> 
>>  Why would that be good or bad?
> 
> The context here is a UNIX-style topological sorting program.
> Being I/O bound means that the program is limited by how fast
> it can read the data.  If 90% of the time goes into reading
> the data, that means that the *algorithmic* part is fast enough.
> 
> There may be very little one can do about the I/O part.Maybe you could say how the Java I/O is being done.


> I didn't _think_ I'd omitted anything important.  Oh well.

-snip->     For 50,000 nodes and 8,385,254 edges,
>     Java (first version) ran out of memory after 89.54 seconds (default heap)
>     Java (2nd version)  13.31 seconds   -- avoid Integer boxing!
>         Java (3rd version)  15.07 seconds
>         Smalltalk           14.52 seconds
>         C                    2.36 secondsfwiw my expectation is that Java would not be that much slower than C, and would be considerably faster than Smalltalk.

fwiw my expectation is that should be possible to make the Java program considerably faster. Of course, what I expect and what happens are often very different ;-)>>>>  Of course, to some degree, user defined hash functions remedy that 
> specific problem.
>>> 
>>>  While creating other, and perhaps worse, ones.
>>> 
>>>  For example, in the Smalltalk code, if you use a Dictionary of Strings,
>>>  you're getting Robert Jenkin's hash function in optimised C.  
> If you
>>>  supply your own, you're getting a very probably worse hash function
>>>  and it's going to run rather slower.  And above all, the stuff you 
> are
>>>  benchmarking is no longer code that people are actually likely to 
> write.
>> 
>>  I think you're being a touch quarrelsome :-)
> 
> That upset me.I'm sorry that gentle comment upset you.> All I was saying is that surely the only *point* of
> talking about the performance of *languages* is to get some idea of
> how well programs are likely to do, and not any old specially crafted
> code, but the kind of code that is likely to be written for purposes
> other than benchmarking.  So the more you bash on a particular example
> to get the time down, the less representative it is of the kind of code
> people are likely to write *for purposes other than benchmarking*.Certainly less representative of the kind of code students are likely to write for course credits, and the kind of code people are likely to write to complete some project task before they hand it off to someone else, and the kind of code people are likely to write until their program is actually put into use and suddenly they are working the weekend to make their program faster.A more positive way to think of - code written for purposes of benchmarking - is that it's like code written to address a performance hot spot. 


> Just today I marked a student program where their program took 15 minutes
> and my model answer took a touch under 4 milliseconds.  I explained to
> them _that_ their program was spectacularly slow.  They already knew _why_
> it was.  I also explained the one trick (lazy initialisation) that could
> have hugely improved the time.  I also told them that they had made the
> right decision, to optimise *their* time, not the computer's, in their
> particular context.The whole point is carried by your final assertion.

Here's another anecdote - in my first week on site, I overheard a group of engineers arguing that Smalltalk should be abandoned because calculation times were incredibly slow. I peeked at the code, saw that it was littered with unnecessary numeric conversions (plus unnecessary arbitrary precision arithmetic), fixed and timed a sample, and left them with a lot of rework to do all across their library code. 

The "kind of code people are likely to write" is sometimes best described as bad.That can have consequences which spiral out of proportion -- an engineer writes some bad Smalltalk, an app performs so slowly the business group loses money, the manager of the business group notices and is shown that the slow app is the problem (and it really is the problem), the manager now "knows" that "Smalltalk is slow", the manager moves the business group away from Smalltalk, the manager is promoted and moves the whole organization away from Smalltalk. That's also an anecdote.


> *That* function, no.  *Some* hash function as a "primitive" (meaning
> optimised C), well, I don't have every Smalltalk, but the ones I do
> have, I've checked, and yes they do.Maybe I misunderstood the thrust of your original comment - Were you trying to make a point about writing in C and calling that code from Smalltalk as a user written primitive; or were you trying to make a point about the importance of using a better hash function?


>>  Have you researched what code people are actually likely to write, or are 
> you just speculating? ;-)
> 
> I'm in my 6th decade.  I've seen a lot of code in a lot of languages.So just speculating.


> The subject line has the obvious and boring answer "yes, of course".

I agree, because of the implicit qualification - "if we write the C++ badly enough" :-)


-snip-> The point here of course is that there are data structures and algorithms
> that are easy to express in some languages but hard in others, so that
> in code written *for purposes other than benchmarking*, they just would
> not be _used_ in one of those languages.Let's say that's true - until we show how much better programs using other data structures and algorithms perform those specific tasks, it's just an excuse. 

(And, depending on how hard it is to express those different data structures and algorithms in other languages, it might not even be a good excuse.)
Richard O'Keefe 1337742011Wed, 23 May 2012 03:00:11 +0000 (UTC)
On 23/05/2012, at 4:54 AM, Isaac Gouy wrote:
>> There may be very little one can do about the I/O part.
> 
> Maybe you could say how the Java I/O is being done.

>>     For 50,000 nodes and 8,385,254 edges,
>>     Java (first version) ran out of memory after 89.54 seconds (default heap)
>>     Java (2nd version)  13.31 seconds   -- avoid Integer boxing!
>>         Java (3rd version)  15.07 seconds
>>         Smalltalk           14.52 seconds
>>         C                    2.36 seconds
> 
> fwiw my expectation is that Java would not be that much slower than C, and would be considerably faster than Smalltalk.My own experience is that Java is anywhere between 2 times slower and 150 times
slower than C.  This is not for number crunching; one _would_ expect Java to be
about the same as C for that because it is working at the same level as C.  But
string processing and text I/O using the java.io.* classes aren't brilliant.

   Reader r = new BufferedReader(new InputStreamReader(System.in));
            
This "layers of wrappers" approach to text I/O adds overheads of its own, but
less than I'd thought.  Using System.in directly takes the time down from
15.07 seconds to 11.11 seconds.  The code used Character.isWhitespace(c) to
test for a white space character; replacing that by c <= ' ' brought the time
down further to 10.87 seconds.

With both of these changes, we are moving away from recommended good practice;
the faster code is the kind of code people are not supposed to write any more.

Characters are read one at a time using r.read().  There are no fast "skip
characters in this set" or "take characters in this set and return them as
a string" methods in the Reader or InputStream interfaces, and StreamTokenizer
reads characters one at a time using .read() also, so would be slower.

As for Smalltalk, you must be thinking of free Smalltalk systems that lacked a
JIT.  Commercial Smalltalks have excellent JITs (many HotSpot ideas were copied
from them) and there is now a pretty good one for the free systems Squeak and
Pharo.  These particular measurements were made using my own Smalltalk compiler
which is an oddity amongst Smalltalks: a whole program compiler that compiles
via C.  Yes, most of the good ideas came from INRIA, although ST/X does
something not entirely dissimilar.

> fwiw my expectation is that should be possible to make the Java program considerably faster.

I have used profiling to find where the time was going.
Approximately 70% is spent in the one method that reads a "word":
 - a loop skipping white space characters
 - a loop reading other characters and adding them to a StringBuilder
 - [looking the string up in a HashMap and creating and entering a
    new Node if necessary -- this time is not part of that 70%].

Any reasonable person would expect file reading to be buffered and
for the read-one-byte method to usually just fiddle a few integers
and fetch an element of an array.  In fact the method is native, so
every character requires switching from the Java universe to the C
one and back.  (The Smalltalk equivalent is pretty close to fgetc().)

The only way to make this Java program 'considerably faster' is to
not read characters (or bytes) in the natural Java way.  (The way,
in fact, that java.io.StreamTokenizer does.)

And it's not INTERESTING, and it's not about LANGUAGES.
There is NOTHING about the Java language that makes code like this
necessarily slow.  It's the LIBRARY.  The java.io library was
designed for flexibility, not speed.  That's why there is a java.nio
library.  

Haskell is in pretty much the same boat.  I've always liked the
I/O model in the Haskell report.  I've expected simplicity from it,
not speed, and that's what I get.  But as all the new approaches
to I/O (like iteratees, which make my head hurt) make perfectly
clear, the limitations of the IO module are not limitations of
Haskell, or of JHC, but of a particular library.

And that's the point I was making with this example.  Why does
Smalltalk come out in the middle of the Java results?  A balance
between a language penalty (tagged integer arithmetic is a lot
slower than native integer arithmetic) and a library bonus (a
leaner meaner I/O design where there are wrappers if you want
them but you very seldom need them).  It's the great advantage
of using libraries rather than syntax: libraries can be changed.> 
>> All I was saying is that surely the only *point* of
>> talking about the performance of *languages* is to get some idea of
>> how well programs are likely to do, and not any old specially crafted
>> code, but the kind of code that is likely to be written for purposes
>> other than benchmarking.  So the more you bash on a particular example
>> to get the time down, the less representative it is of the kind of code
>> people are likely to write *for purposes other than benchmarking*.
> 
> Certainly less representative of the kind of code studentsLeave the students out of it.  It's less representative of the kind of
code I see written by Sun or in Apache stuff.
> 
> The "kind of code people are likely to write" is sometimes best described as bad.At last, something we can agree about!> 
>> *That* function, no.  *Some* hash function as a "primitive" (meaning
>> optimised C), well, I don't have every Smalltalk, but the ones I do
>> have, I've checked, and yes they do.
> 
> Maybe I misunderstood the thrust of your original comment - Were you trying to make a point about writing in C and calling that code from Smalltalk as a user written primitive; or were you trying to make a point about the importance of using a better hash function?Neither.  I am making the point that many benchmarks benchmark library
code rather than languages or compilers per se, and that the concept of
"same algorithm" is as a result very fuzzy.> 
> 
>>>   Have you researched what code people are actually likely to write, or are 
>> you just speculating? ;-)
>> 
>> I'm in my 6th decade.  I've seen a lot of code in a lot of languages.
> 
> So just speculating.How is "I have seen a lot of code" construed as "just speculating"?
I know what code people are likely to write because I have seen the
code that many people DID write.  I know what code people are likelyto write in Java because I have seen more Java (written by people
who really should know what they are doing) than I ever wanted to see
AND because I know how they are TOLD to write.
> 
> 
>> The subject line has the obvious and boring answer "yes, of course".
> 
> I agree, because of the implicit qualification - "if we write the C++ badly enough" :-)Otherwise construed, "the way C++ is usually written".
C++ *can* be extremely fast.  One of my colleagues is very good at it.
But one of his key rules is "never use the standard template library,
including C++ strings.">> The point here of course is that there are data structures and algorithms
>> that are easy to express in some languages but hard in others, so that
>> in code written *for purposes other than benchmarking*, they just would
>> not be _used_ in one of those languages.
> 
> Let's say that's true - until we show how much better programs using other data structures and algorithms perform those specific tasks, it's just an excuse. 
> 
> (And, depending on how hard it is to express those different data structures and algorithms in other languages, it might not even be a good excuse.)My evidence may be anecdotal, but it is better than arguing with
NO evidence.

The example you are commenting on here is one where I reported an
emulated Prolog program running faster than native Fortran 77.
Recall that Fortran 77
 - did not include recursion
 - did not include pointers
 - did not include user defined data types
The kind of dynamic tree construction and walking that is so
trivial and every day in Haskell was extremely difficult to
express in Fortran 77.

Not necessarily impossible.  A parser for English was once written
in Fortran, and I think that was before Fortran 77 even.  But hard.

A programming language that supports concurrency, or backtracking,
or constraint programming, or type level arithmetic, or dependent
types, or ... will make some things thinkable and straightforward
to express that would be neither in a language without such
support.
Isaac Gouy 1337791221Wed, 23 May 2012 16:40:21 +0000 (UTC)
> From: Richard O'Keefe 
> Sent: Tuesday, May 22, 2012 7:59 PM

> But string processing and text I/O using the java.io.* classes aren't brilliant.Wait just a moment - Are you comparing text I/O for C programs that process bytes against Java programs that process double-byte unicode?


-snip-

>  Using System.in directly takes the time down from 15.07 seconds to 11.11 seconds.
-snip-> With both of these changes, we are moving away from recommended good practice;
> the faster code is the kind of code people are not supposed to write any more.Says who? Is that on your own authority or some other source you can point us to?


-snip-> As for Smalltalk, you must be thinking of free Smalltalk systems that lacked a
> JIT.  Commercial Smalltalks have excellent JITs (many HotSpot ideas were copied
> from them) ...As for Smalltalk, I earned my crust with commercial Smalltalks for a decade.> These particular measurements were made using my own Smalltalk compiler
> which is an oddity amongst Smalltalks: a whole program compiler that compiles
> via C.  Yes, most of the good ideas came from INRIA, although ST/X does
> something not entirely dissimilar.Wait just a moment - you wrote "I didn't _think_ I'd omitted anything important" and now it turns out that the measurements were made using your personal Smalltalk implementation!

You have got to be joking.>>  fwiw my expectation is that should be possible to make the Java program 
>> considerably faster.-snip-
> Any reasonable person would expect ...

I'm happy to hear what *you* would expect.

-snip-> And it's not INTERESTING, and it's not about LANGUAGES.
> There is NOTHING about the Java language that makes code like this
> necessarily slow.  It's the LIBRARY.  The java.io library was
> designed for flexibility, not speed.  That's why there is a java.nio
> library.Here's the gorilla in the room question - So why doesn't your program use java.nio?> And that's the point I was making with this example.  Why does
> Smalltalk come out in the middle of the Java results?  A balance
> between a language penalty (tagged integer arithmetic is a lot
> slower than native integer arithmetic) and a library bonus (a
> leaner meaner I/O design where there are wrappers if you want
> them but you very seldom need them).  It's the great advantage
> of using libraries rather than syntax: libraries can be changed.No, that doesn't seem to be the case - if I'm misunderstanding what you've done then please correct me, but it seems that Smalltalk comes out in the middle of the Java results because you chose to use a Java library "designed for flexibility, not speed" and you chose to use that library in a way that slows the program down.


-snip-> Neither.  I am making the point that many benchmarks benchmark library
> code rather than languages or compilers per se, and that the concept of
> "same algorithm" is as a result very fuzzy.Thank you, for stating your point clearly.


-snip-
> How is "I have seen a lot of code" construed as "just speculating"?

You seem to be generalizing from what you recollect without any way to control for the particularities of the situations you observed, or the particularities of your recollection. You don't seem to have data - just memories.


-snip-
> My evidence may be anecdotal, but it is better than arguing with NO evidence.

imo It would be better to "show how much better programs using other data structures and algorithms perform those specific tasks" than brandish anecdotes from a past century.
Richard O'Keefe 1337838777Thu, 24 May 2012 05:52:57 +0000 (UTC)
On 24/05/2012, at 4:39 AM, Isaac Gouy wrote:

>> From: Richard O'Keefe 
>> Sent: Tuesday, May 22, 2012 7:59 PM
> 
>> But string processing and text I/O using the java.io.* classes aren't brilliant.
> 
> Wait just a moment - Are you comparing text I/O for C programs that process bytes against Java programs that process double-byte unicode?No.  Amongst other things, I have my own ByteString and ByteStringBuilder
classes that are basically clones of String and StringBuilder, and using
them makes surprisingly little direct difference; the point is saving
memory.

I have obtained large speedups in Java using Java by dodging around the
Java libraries.  Other people have reported the same to me.>> With both of these changes, we are moving away from recommended good practice;
>> the faster code is the kind of code people are not supposed to write any more.
> 
> Says who? Is that on your own authority or some other source you can point us to?It looks increasingly as though there is no point in this discussion.
Is there ANY conceivable criticism of Java that will not elicit
ad hominem attacks from you?

I have read more Java textbooks than I wished to.  I was on Sun's
Java techniques and tips mailing list for years.  I could go on,
but is there, *really*, any point?> 
>> These particular measurements were made using my own Smalltalk compiler
>> which is an oddity amongst Smalltalks: a whole program compiler that compiles
>> via C.  Yes, most of the good ideas came from INRIA, although ST/X does
>> something not entirely dissimilar.
> 
> Wait just a moment - you wrote "I didn't _think_ I'd omitted anything important" and now it turns out that the measurements were made using your personal Smalltalk implementation!
> 
> You have got to be joking.Why?  On various benchmarks, sometimes VisualWorks is better,
sometimes my system is better.  My system is utterly naive,
incorporating almost none of the classic Smalltalk optimisations.

I redid the test using VisualWorks NonCommercial.
It took about twice as long as my Smalltalk did.
According to 'TimeProfiler profile: [...]',
98% of the time is in the load phase; half of that
is down to the hash table.  A surprisingly small part
of the rest is due to actual input (ExternalReadStream>>next);
quite a bit goes into building strings and testing characters.

Why the difference?
With all due respect, VisualWorks still has the classic Smalltalk
implementation of hash tables.  Mine is different.  This is a
library issue, not a language issue.
One of the tasks in reading is skipping separators.
Since it's used a lot in parsing input, my library pushes that
right down to the bottom level of ReadStream and ChannelInputStream.
VisualWorks uses a single generic implementation that doesn't get
up to the low level dodges mine does.  And so on.

All *library* issues, not *compiler* or *language* issues.

Which is the whole point of this thread, as far as I am concerned.
C, Java, Smalltalk: this real example is dominated by *library*
level issues, not language issues or compiler issues.>> And it's not INTERESTING, and it's not about LANGUAGES.
>> There is NOTHING about the Java language that makes code like this
>> necessarily slow.  It's the LIBRARY.  The java.io library was
>> designed for flexibility, not speed.  That's why there is a java.nio
>> library.  
> 
> Here's the gorilla in the room question - So why doesn't your program use java.nio?
>Because that would be insane.

This is a program I originally whipped up in less than an hour
for two reasons:

(A) I wanted to provide some students with an example of a
    "work-list" algorithm that had some realism to it.
    For that purpose, the program had to be READABLE.

(B) To my astonishment, the tsort(1) programs in OpenSolaris
    and Mac OS X 10.6.8 turned out to be grotesquely slow for
    non-toy graphs.  I was expecting to have a use for the
    program myself, so as it stood, the Java version was
    already quite fast enough to be useful.  (As in, a LOT
    faster than the system version, even though the system
    version was written in C.)

The one issue I had with the first version was not time, but
space, so I explored two ways of making it take less space.

There is no NEED to rewrite the program to use java.nio;
having replaced the system version of the command the Java
version was no longer the bottleneck in my intended use.

For me personally, having no experience with java.nio,
it was *easier* to rewrite the program from scratch in C
than to overcome the java.nio learning curve.  And in any
case, I knew very well that I could get near enough to the
same order of improvement using InputStream and wrapping
my own buffering code over that (I've done that before).
Above all, since the students were even less familiar with
nio than I am, using nio would have destroyed the program's
utility for purpose (A).

As for the Smalltalk version, I often rewrite small things
into Smalltalk in order to find out what I'm doing wrong in
my implementation.> 
>> And that's the point I was making with this example.  Why does
>> Smalltalk come out in the middle of the Java results?  A balance
>> between a language penalty (tagged integer arithmetic is a lot
>> slower than native integer arithmetic) and a library bonus (a
>> leaner meaner I/O design where there are wrappers if you want
>> them but you very seldom need them).  It's the great advantage
>> of using libraries rather than syntax: libraries can be changed.
> 
> No, that doesn't seem to be the case - if I'm misunderstanding what you've done then please correct me, but it seems that Smalltalk comes out in the middle of the Java results because you chose to use a Java library "designed for flexibility, not speed" and you chose to use that library in a way that slows the program down.No, I chose to
 - use the official Java plain text I/O library
 - the way the official Java series books and tutorials
   say it should be used
 - with a MINIMUM of wrapper layers.

And it was FAST ENOUGH TO BE USEFUL.
No, I chose to use that library THE WAY IT IS INTENDED TO BE USED.
It is the simplest most straightforward way to go.
It's the *same* "algorithm" that the C and Smalltalk versions use.

> imo It would be better to "show how much better programs using other data structures and algorithms perform those specific tasks" than brandish anecdotes from a past century.

"Past century"?  Insults, is it?

As for "how much better programs using other data structures and algorithms
perform", this whole thread is about how well programs using the SAME data
structures and algorithms perform, and whether we can assign much meaning
to that.  How could it possibly be better to do something irrelevant to the
topic?
Bryan O'Sullivan 1337840224Thu, 24 May 2012 06:17:04 +0000 (UTC)
On Wed, May 23, 2012 at 10:52 PM, Richard O'Keefe  wrote:

> "Past century"?  Insults, is it?
>Do you fine gentlemen absolutely have to continue this endless, offtopic,
unedifying back-and-forth in public? Please.
Isaac Gouy 1337842930Thu, 24 May 2012 07:02:10 +0000 (UTC)
Sorry Bryan, there are a couple of comments I should make a final reply to - I'll ignore the rest.> From: Richard O'Keefe 
> Sent: Wednesday, May 23, 2012 10:52 PM-snip->>  Says who? Is that on your own authority or some other source you can point 
>> us to?
> 
> It looks increasingly as though there is no point in this discussion.
> Is there ANY conceivable criticism of Java that will not elicit
> ad hominem attacks from you?It isn't an ad hominem attack to ask you who's the authority that made some recommendation.


-snip->>  Wait just a moment - you wrote "I didn't _think_ I'd omitted 
>>   anything important" and now it turns out that the measurements were made 
>>   using your personal Smalltalk implementation!
>> 
>>  You have got to be joking.
> 
> Why?Because you omitted basic information about the measurements you presented.


-snip->>  imo It would be better to "show how much better programs using other 
> data structures and algorithms perform those specific tasks" than brandish 
> anecdotes from a past century.
> 
> "Past century"?  Insults, is it?No, it's an echo of the words you used - "...insanely difficult in Fortran 77.  This century's Fortran is of course another matter."
wren ng thornton 1337747457Wed, 23 May 2012 04:30:57 +0000 (UTC)
On 5/22/12 12:54 PM, Isaac Gouy wrote:
> On 5/21/2012 6:54 PM, Richard O'Keefe wrote:
>>      For 50,000 nodes and 8,385,254 edges,
>>      Java (first version) ran out of memory after 89.54 seconds (default heap)
>>      Java (2nd version)  13.31 seconds   -- avoid Integer boxing!
>>          Java (3rd version)  15.07 seconds
>>          Smalltalk           14.52 seconds
>>          C                    2.36 seconds
>
> fwiw my expectation is that Java would not be that much slower than C, and would be considerably faster than Smalltalk.FWIW, that matches my expectations pretty well. Naive/standard Java 
performing slower than Smalltalk; highly tweaked Java using non-standard 
data types performing on-par with or somewhat faster than Smalltalk. 
That C is 7x faster is a bit on the high end, but for something like 
tsort I could imagine it'd be possible.

Do bear in mind that Java doesn't optimize ---that's the JIT's job--- 
and that the standard datatypes for Java are only good enough to suffice 
for casual use and to be better than *naively* implementing them 
yourself. They are extremely mediocre if you have any sort of 
"specialized" needs. If you know what you're doing in designing data 
structures, you can get amazing mileage out of writing a hand-tuned 
implementation that (1) avoids boxing native types, (2) uses a 
non-generic structure which has good complexities for the kinds of 
operations you'll actually be using, and (3) doesn't bother trying to 
implement the ridiculous APIs of the standard data types.

But even still, in my experience of using Smalltalk, the standard data 
structures are much better done and so they will be on-par with what 
you'd get from hand-tuning for Java. I've spent a lot of time trying to 
get decent performance out of Java, not so much with Smalltalk; but the 
performance with Smalltalk was sufficient that it wasn't needed so badly.-- 
Live well,
~wren
Isaac Gouy 1337804865Wed, 23 May 2012 20:27:45 +0000 (UTC)
> From: wren ng thornton 

> Sent: Tuesday, May 22, 2012 9:30 PM-snip-> FWIW, that matches my expectations pretty well. Naive/standard Java performing 
> slower than Smalltalk; highly tweaked Java using non-standard data types 
> performing on-par with or somewhat faster than Smalltalk.I have no difficulty believing that if you are talking about a 1996 Java reference implementation and a 1996 Smalltalk JIT VM.

I could believe that if you are comparing a naive Java program with a highly tweaked Smalltalk program.


> That C is 7x faster is a bit on the high end, but for something like tsort I could imagine it'd be possible.

It's possible because it's possible to write a Java program to be slower than it need be :-)


> Do bear in mind that Java doesn't optimize ---that's the JIT's job 

What are we supposed to make of that?

Why write that and not -- Do bear in mind that Smalltalk doesn't optimize that's the JIT's job -- or -- Do bear in mind that C doesn't optimize that's the compiler's job.


-snip-> But even still, in my experience of using Smalltalk, the standard data 
> structures are much better done and so they will be on-par with what you'd 
> get from hand-tuning for Java. I've spent a lot of time trying to get decent 
> performance out of Java, not so much with Smalltalk; but the performance with 
> Smalltalk was sufficient that it wasn't needed so badly.Do you have a specific example that you can share?
Ryan Newton 1337871150Thu, 24 May 2012 14:52:30 +0000 (UTC)
Oops, forgot to reply-to-all.  This was a minor clarification on Wren's
behalf (he can correct me if I'm wrong).  But I agree with Bryan that it's
time for the thread to die:> > Do bear in mind that Java doesn't optimize ---that's the JIT's job
>
> What are we supposed to make of that?
>
> Why write that and not -- Do bear in mind that Smalltalk doesn't optimize
> that's the JIT's job -- or -- Do bear in mind that C doesn't optimize
> that's the compiler's job.
>I believe this was referring to the fact that javac isn't an aggressive
optimizing compiler on the way from source to bytecode, i.e. it's the
bytecode->asm leg where the optimization effort is focused.

As an outsider to things Java that's something I've had trouble
understanding, actually.  It doesn't seem like an either-or choice to me...

   -RyanOn Wed, May 23, 2012 at 4:26 PM, Isaac Gouy  wrote:

> > From: wren ng thornton 
>
> > Sent: Tuesday, May 22, 2012 9:30 PM
>
> -snip-
> > FWIW, that matches my expectations pretty well. Naive/standard Java
> performing
> > slower than Smalltalk; highly tweaked Java using non-standard data types
> > performing on-par with or somewhat faster than Smalltalk.
>
> I have no difficulty believing that if you are talking about a 1996 Java
> reference implementation and a 1996 Smalltalk JIT VM.
>
> I could believe that if you are comparing a naive Java program with a
> highly tweaked Smalltalk program.
>
>
> > That C is 7x faster is a bit on the high end, but for something like
> tsort I could imagine it'd be possible.
>
> It's possible because it's possible to write a Java program to be slower
> than it need be :-)
>
>
> > Do bear in mind that Java doesn't optimize ---that's the JIT's job
>
> What are we supposed to make of that?
>
> Why write that and not -- Do bear in mind that Smalltalk doesn't optimize
> that's the JIT's job -- or -- Do bear in mind that C doesn't optimize
> that's the compiler's job.
>
>
> -snip-
> > But even still, in my experience of using Smalltalk, the standard data
> > structures are much better done and so they will be on-par with what
> you'd
> > get from hand-tuning for Java. I've spent a lot of time trying to get
> decent
> > performance out of Java, not so much with Smalltalk; but the performance
> with
> > Smalltalk was sufficient that it wasn't needed so badly.
>
> Do you have a specific example that you can share?
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> 
> http://www.haskell.org/mailman/listinfo/haske...
>
Chris Dornan 1337876298Thu, 24 May 2012 16:18:18 +0000 (UTC)
> Oops, forgot to reply-to-all.

Noooooooooooo! You had the right idea the first time. :-)(Please excuse us while we chide you as humorously as we can into putting this thread out of its misery.)

Chris
Roman Werpachowski 1337425149Sat, 19 May 2012 10:59:09 +0000 (UTC)
> Date: Sat, 19 May 2012 08:57:38 +0200
> From: Ertugrul S?ylemez 
> Subject: Re: [Haskell-cafe] Can Haskell outperform C++?
> To: 
> Message-ID: 
> Content-Type: text/plain; charset="us-ascii"


> Haskell delivers reasonable performance for almost all non-embedded
> applications, and for the extreme edge cases one can still switch to C
> or assembly using the FFI.At the cost of introducing another dimension of complexity (two
codebases, two toolchains, need for programmers specialising in two
languages). There is a reason why many businesses go to great pains to
ensure their codebase is homogeneous (and it's not "managers are
dumb").

RW
wren ng thornton 1337413780Sat, 19 May 2012 07:49:40 +0000 (UTC)
On 5/18/12 7:45 AM, Roman Werpachowski wrote:
> On Fri, 18 May 2012 15:30:09 +1200, "Richard O'Keefe" wrote:
>> The claim was and remains solely that
>> THE TIME DIFFERENCE BETWEEN *ALGORITHMS*
>>    can be bigger than
>> THE TIME DIFFERENCE BETWEEN *LANGUAGES*
>>    and is in this particular case.
>
> Yes, but aren't the differences in the same ballpark, and if we want
> to compare *languages*, we should use identical algorithms to make the
> comparison fair."Fair" in what sense? That is, what _exactly_ are you hoping to compare?

If the goal is to benchmark the implementation of the runtime, VM, or 
built-in types, then requiring the same algorithm makes sense--- because 
the algorithm is irrelevant other than to provide a bunch of calls to 
the runtime/vm/etc. However, benchmarking a language's implementation in 
this way is rarely that helpful. It's great for comparing CPython to 
PyPy (or any other in-language cross-compiler comparison), but what 
would it tell you about Haskell vs C++?

If the goal is to compare, say, production costs for a given level of 
performance, then fixing the algorithm is not at all fair. The fact of 
the matter is that different languages make different algorithms easier 
to (a) implement, and (b) discover/identify/generalize. Thus, when it 
comes to real-world software, the language that makes it easy to 
implement good algorithms has a major advantage--- an advantage which is 
being specifically ignored by fixing the algorithm aforehand.

In order for "fair" to have any meaning whatsoever, we must first 
specify what is being compared, so that we know what it is that things 
are supposed to be fair with regard to.-- 
Live well,
~wren
Isaac Gouy 1337447289Sat, 19 May 2012 17:08:09 +0000 (UTC)
> From: wren ng thornton 
> Sent: Saturday, May 19, 2012 12:49 AM
> Subject: Re: [Haskell-cafe] Can Haskell outperform C++?-snip-> "Fair" in what sense? That is, what _exactly_ are you hoping to 
> compare?
> 
> If the goal is to benchmark the implementation of the runtime, VM, or built-in 
> types, then requiring the same algorithm makes sense--- because the algorithm is 
> irrelevant other than to provide a bunch of calls to the runtime/vm/etc. 
> However, benchmarking a language's implementation in this way is rarely that 
> helpful. It's great for comparing CPython to PyPy (or any other in-language 
> cross-compiler comparison), but what would it tell you about Haskell vs C++?The PyPy crowd won't like it if you take programs written for CPython and measure how they run with PyPy - that's "not fair". But it might take a couple of years before they contribute programs optimised for PyPy :-((You already said what it would tell you, but questioned how helpful that would be.)


> If the goal is to compare, say, production costs for a given level of 
> performance, then fixing the algorithm is not at all fair. The fact of the 
> matter is that different languages make different algorithms easier to (a) 
> implement, and (b) discover/identify/generalize. Thus, when it comes to 
> real-world software, the language that makes it easy to implement good 
> algorithms has a major advantage--- an advantage which is being specifically 
> ignored by fixing the algorithm aforehand.Let's just say that's true - Is it useful? What would we need to do to make the comparison?

We could do something like - "Plat_Forms: Is there a single best web development technology? A professional programming contest"

http://page.mi.fu-berlin.de/prechelt/Biblio/p...

But that was just once, with very few teams, and only one problem -- seems like it would need to be repeated more often than is affordable, and with more teams than can be persuaded to donate their time.

Maybe your point is true but practically useless? :-(> In order for "fair" to have any meaning whatsoever, we must first 
> specify what is being compared, so that we know what it is that things are 
> supposed to be fair with regard to.'What does "not fair" mean? (A fable)'    http://stackoverflow.com/a/6380299
Isaac Gouy 1337359081Fri, 18 May 2012 16:38:01 +0000 (UTC)
----- Original Message -----
> From: Richard O'Keefe 
> Sent: Thursday, May 17, 2012 8:30 PM
> Subject: Re: [Haskell-cafe] Can Haskell outperform C++?----- Original Message -----
> From: Richard O'Keefe 
> Sent: Thursday, May 17, 2012 8:30 PM
> Subject: Re: [Haskell-cafe] Can Haskell outperform C++?
-snip-

> The claim was and remains solely that
> THE TIME DIFFERENCE BETWEEN *ALGORITHMS*
>    can be bigger than
> THE TIME DIFFERENCE BETWEEN *LANGUAGES*
>    and is in this particular case.That seems like a modest and uncontentious claim.> There is also a second claim, which I thought was uncontentious:
> the relative difficulty of tasks varies with language.That doesn't seem unlikely; but I think you'd have to spell out just what you mean by language, and it also doesn't seem unlikely that for the same task the relative difficulty might flip when other details change (the people doing the task, the language implementation, the libraries available for the language implementation...)

It all seems so very particular ;-)
Ad
Home | About | Privacy